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(54) Comnnon collaboration initiator in multimedia collaboration system 

(57) A collaboraton system that integrates separate 
real-time and asynchronous networks - the former for 
real-time audio and video, and the latter for control sig- 
nals and textual, graphical and otha data - In a manner 
which ck>sely approximates the experience of face-to- 
face collaboration. These capabilities are achieved by 
exploiting a variety of hardware, software and network- 
ing technologies in a manner that preserves the quality 
and integrity of audio/video/data and other multimedia 
information, even after wide area transmission, and at a 
significantly reduced networking cost as compared to 
what would be required by presently known 
approaches. The system architecture is readily scalable 
to the largest enterprise network environments. It 
accommodates differing levels of collaborative capabili- 
ties availatile to individual users and permits high-quai- 
ity audio and video capabilities to be readily 
superimposed onto existing personal computers ar\d 
workstations (12) and their interconnecting LANs (10) 
and WANs (15). In the case of a plurality of geographi- 
cally dispersed LANs (10) interconnected by a WAN 
(15), the demands made on the WAN are significantly 
reduced by employing multi-hopping techniques, includ- 
ing avoiding the unnecessary decompression of da^ at 
intermediate hops, as well as video mosaicing and cut- 
and-paste technology. 
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Description 

BACKGROUND OF THE INVENTION 

The present invention relates to computer-based sys- 
tems for enhancing collaboration between and among 
individuals who are separated by distance and/or time 
(referred to herein as "distributed collaboration'*). Princi- 
pal among the invention's goals is to replicate in a desk- 
top environment, to the maximum extent possible, the 
full range, level and intensity of interpersonal communi- 
cation and information sharing which would occur if all 
the participants were together in the same room at the 
same time (referred to herein as "lace-to-face collabora- 
tion"), 

[0001] It is well known to behavioral scientists that 
interpersonal communication involves a large number of 
subtle and complex visual cues, refen-ed to by names 
like "eye contact" and "body language." which provide 
additional information over and above the spoken words 
and explicit gestures. These cues are. for the most part, 
processed subconsciously by the participants, and 
often control the course of a meeting. 
[0002] In addition to spoken words, demonstrative 
gestures and behavioral cues, collaboration often 
involves the sharing of visual information - e g., printed 
material such as articles, drawings, photographs, charts 
and graphs, as well as videotapes and computer-t^sed 
aninnations, visualizations and other displays - in such 
a way that the particpants can collectively and Interac- 
tively examine, discuss, annotate and revise the infor- 
mation. This combination of spoken words, gestures, 
visual cues and interactive data sharing significantly 
enhances the effectiveness of collaboration in a variety 
of contexts, such as "brainstorming" sessions among 
professionals in a particular field, consultations between 
one or more experts and one or more clients, sensitive 
business or political negotiations and the like. In distrib- 
uted collaboration settings, then, where the participants 
cannot be in the same place at the same time, the ben* 
eficial effects of face-to-face collaboration will be real- 
ized only to the extent that each of the remotely located 
partidpants can be "reaeated" at each site. 
[0003] To illustrate the difficulties inherent in reproduc- 
ing the benef ida) effects of face-to-face collaboration in 
a distributed collaboration environment, consider the 
case of decision-making in the fast-moving commodities 
trading markets, where many thousands of dollars of 
profit (or loss) may depend on an expert trader making 
the right dedsion within hours, or even minutes, of 
receiving a request from a distant client. The expert 
requires immediate access to a wide range of poten- 
tially relevant information such as financial data histori- 
cal pridng information, cun^ent price quotes, newswire 
services, government policies and programs, economic 
forecasts, weather reports, etc. Much of this information 
can be processed by the expert in isolation. However, 
before making a dedsion to buy or sell, he or she will 



frequently need to discuss the information witii other 
experts, who may be geographically dispersed, and witii 
the client. One or more of tiiese other experts may be in 
a meeting, on another call, or otherwise temporarily 

5 unavailable. In this event, the expert must communk^te 
"asynchronously" - to bridge time as well as distance. 
[0004] As discussed below, prior art desktop video- 
conferencing systems provide, at best, only a partial 
solution to the challenges of distributed collatX)ration in 

10 real time, primarily because of their lack of high-quality 
video (which is necessary for capturing the visual cues 
discussed above) and their limited data sharing capabil- 
ities. Similarly, telephone answering machines, voice 
mall, fax machines and conventional electronic mail 

15 systems provide inconptete solutions to the protHems 
presented by deferred (asynchronous) collaboration 
because they are totally incapable of communicating 
visual cues, gestures, etc. and. like converrtional video- 
conferencing systems, are generally limited in the rich- 

20 ness of the data tiiat can be exchanged. 

[0005] It has been proposed to extend traditional vid- 
eoconferencing capabilities from conference centers, 
where groups of participants must assemble in tiie 
same room, to the desktop, where individual partid- 

25 pants may remain in their office or home. Such a system 
is disdosed in U.S. Patent No. 4,710.917 to Tompkins et 
at. for Video Conferencing Network issued on Decem- 
ber 1 , 1987. It has also been proposed to augment such 
video conferencing systems witii limited "video mail" 

30 facilities. However, such dedicated videoconferencing 
systems (and extensions thereof) do not effectively lev- 
erage the investment in existing embedded information 
infrastructures - such as desktop personal computers 
and workstations, local area network (LAN) and wide 

35 area network (WAN) environments, building wiring, etc. 
~ to fadlitate interactive sharing of data in the form of 
text, images, charts, graphs, recaded video, screen 
displays and the like. That is. they attempt to add com- 
puting capafcNlities to a videoconferencing system. 

40 rather tiian adding multimedia and cdiaborative capa- 
bilities to the user's existing computer system. Thus, 
while such systems may be useful in limited contexts, 
they do not provide tiie capabilities required for maxi- 
mally effective collaboration, and are not cost-effective. 

45 [0006] Conversely, audio and vkieo capture and 
processing capabilities have recentiy been integrated 
into desktop and portable personal computers and 
workstations (hereinafter generically refenred to as 
Workstations"). These capabilities have been used pri- 

50 marily in desktop multimedia authoring systems for pro- 
ducing CD-ROM-based works. While such systems are 
capable of processing, combining, and recording audio, 
video and data locally (i.e., at the desktop), tiiey do not 
adequately support networked collaborative environ- 

55 ments. principally due to tiie substantial bandwidth 
requirements for real-time transmission of high-quality, 
digitized audio and full-motion video which predude 
conventional LANs from supporting more than a few 
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workstations. Thus, although currently available desk- 
top multimedia computers frequently include videocon- 
ferencing and other multimedia or collaborative 
capabilities within their advertised feature set (see, e.g., 
A. Reinhardt. "Video Ck)nquers the Desktop." BYTE, 5 
September 1993. pp. 64-90). such systems have not yet 
solved the many problems inherent in any practical 
implementation of a scalable collaboration system. 

SUMMARY OF TOE INVENTION 10 

[0007] In accordance with the present invention, com- 
puter hardware, software and communications technol- 
ogies are combined in novel ways to produce a 
multimedia collaboration system that greatly facilitates is 
distributed collaboration, in part by replicating the bene- 
fits of face-to-face collaboration. The system tightly inte- 
grates a carefully selected set of multimedia and 
collaborative capabilities, principal among which are 
desktop teleconferencing and multimedia mail. 20 
[0008] As used herein, desktop teleconferencing 
includes real-time audio and/or video teleconferencing, 
as well as data conferencing. Data conferencing, in 
turn, includes snapshot sharing (sharing of "snapshots" 
of selected regions of the user's screen), application 25 
sharing (shared control of running applications), shared 
whiteboard (equivalent to sharing a "Wank" window), 
and associated telepointing and annotation capabilities. 
Teleconferences may be recorded and stored for later 
playback, including both audio/video and all data inter- 30 
actions. 

[0009] While desktop teleconferencing supports real- 
time interactions, multimedia mail permits the asynchro- 
nous exchange of arbitrary multimedia documents, 
including previously recorded teleconferences. Indeed, 35 
it is to be understood that the multimedia capabilities 
underlying desktop teleconferencing and multimedia 
mail also greatly facilitate the aeation, viewing, and 
manipulation of high-quality multimedia documents in 
general, including animations and visualizations that 40 
might be developed, for example, in the course of infor- 
mation analysis and modeling. Further, these anima- 
tions and visualizations may be generated for individual 
rather than collaborative use, such that the present 
invention has utility beyond a collaboration context, 45 
[001 0] The inventon provides for a collaborative mul- 
timedia workstation (CMW) system wherein very high- 
quality audio arKi video capabilities can be readily 
superinnposed onto an enterprise's existing computing 
and network infrastructure, including workstations, so 
LANs, WANs, and building wiring. 
[0011] In a preferred embodiment, the system archi- 
tecture employs separate real-time and asynchronous 
networks - the fonner for real-time audio and video, and 
the latter for non-real-time audio and video, text, graph- ss 
ics and other data, as well as control signals. These net- 
works are interoperable across different computers 
(e.g., Macintosh, Intel-based PCs. and Sun worksta- 



tions), operating systems (e.g., Apple System 7. 
DOSAA/indows, and UNIX) and network operating sys- 
tems (e.g., Novell Netware and Sun ONC-i-). In many 
cases, both networks can actually share the same 
catDling and wall jack connector. 
[0012] The system architecture also accommodates 
the situation in which the user's desktop computing 
and/or communications equipment provides varying lev- 
els of media-handling capability. For example, a collab- 
oration session — whether real-time or asynchronous 
— may include participants whose equipment provides 
capabilities ranging from audio only (a telephone) or 
data only (a personal computer wrth a modem) to a full 
conplement of real-time, high-fidelity audio and full- 
motion video, and high-speed data network facilities. 
[001 3] The CMW system architecture is readily scala- 
t>le to very large enterprise-wide network environments 
accommodating thousands of users. Further, it is an 
open architecture tiiat can accommodate appropriate 
standards. Finally, the CMW system incorporates an 
intuitive, yet powerful, user interface, making the system 
easy to learn and use. 

[0014] The present invention thus provides a distrib- 
uted multimedia collaboration environment that 
achieves the benefits of face-to-lace collaboration as 
nearly as possible, leverages ("snaps on to") existing 
computing and network Infrastructure to the nwimum 
extent possible, scales to very large networks consisting 
of tiiousand of workstations, accommodates emerging 
standards, and is easy to learn and use. The specific 
nature of tiie invention, as well as its objects, featijres. 
advantages and uses, will become nnore readily appar- 
ent from the following detailed description and exam- 
ples, and from the accompanying drawings. 

BRIEF DESCRIPTION OF TOE DRAVWNGS ^ 

[0015] j: 

Rgure 1 is a diagrammatic r^resentation of a mul- 
timedia collaboration system embodiment pf the 
present invention. 

Figures 2A and 2B are representations of a compu- 
ter screen illustrating, to the extent possible in a still 
image, the full-motion video and related user inter- 
face displays which may be generated during oper- 
ation of a preferred emtsodiment of the Invention. 
Rgure 3 is a block and schematic diagram of a pre- 
ferred embodiment of a "multimedia tocal area net- 
work" (MLAN) of tiie present invention. ; r 
Rgure 4 is a block and schematic diagram illustrat- 
ing how a . plurality of geographically dispersed 
MLANs of the type shown in Rgure 3 can be con- 
nected via a wide area network in accordance^ witii 
ftie present invention, '- c ^ . - ■ ■ 
Rgure 5 is a schematic gliagraip illustrating how col- 
laboration sites at distant locations L1-L8 are con- 
ventionally interconnected over ; a wide area 
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network by individually connecting each site to 
every other site. 

Figure 6 is a schematic diagram illustrating how col- 
laboration sites at distant locations LI -L8 are inter- 
connected over a wide area network in an 
embodiment of the invention using a multi-hopping 
approach. 

Figure 7 is a block diagram illustrating an embodi- 
ment of video mosaicing circuitry provided in the 
MLAN of Figure 3. 

Rgures 8A, SB and 8C illustrate the video window 
on a typical computer screen which may be gener- 
ated during operation of the present invention, and 
which contains only the callee for two-party calls 
(8A) and a video mosaic of all participants, e.g., for 
four-party (8B) or eight-party (80) conference calls. 
Figure 9 is a block diagram illustrating an emtDodi- 
ment of audio mixing circuitry provided in the MLAN 
of Figure 3. 

Figure 10 is a bkx^k diagram illustrating video cut- 
and-paste circuitry provided in the MLAN of Figure 
3. 

Figure 11 is a schematic diagram illustrating typical 
operation of the video cut-and-paste circuitry in Fig- 
ure 10. 

Rgures 12-17 (consisting of Figures 12A, 128. 
13A. 13B. 14A. 14B. ISA. 15B. 16. 17Aand 17B) 
Illustrate various examples of how the present 
Invention provides video mosaicing, video cut-and- 
pasting, and audio mixing at a plurality of distant 
sites for transmission over a wide area network in 
order to provide, at the CMW of each conference 
participant, video images and audio captured from 
the other conference participants. 
Figures 1 8A and 1 8B illustrate two different embod- 
iments of a CMW which may be enployed in 
accordance with the present invention. 
Rgure 19 is a schematic diagram of an entxxJi- 
ment of a CMW ^-on box containing integrated 
audio and video I/O circuitry in accordance with the 
present invention. 

Rgure 20 illustrates CMW software In accordance 
with an embodiment of the present invention, inte- 
grated witii standard multitasking operating system 
and applications software. 
Rgure 21 illustrates software modules which may 
be provided for running on the MLAN Server in the 
MLAN of Figure 3 for controlling operation of the AV 
and Data Networks. 

Rgure 22 illustrates an enlarged example of 
"speed-dial" face icons of certain collaboration par- 
ticipants In a Collaboration Initiator window on a 
typical CMW screen which may be generated dur- 
ing operation of tiie present invention. 
Rgure 23 is a diagrammatic representation of the 
t>aslc operating events occurring in a preferred 
embodiment of the present invention during initia- 
tion of a two-party call. 



Rgure 24 is a block and schematic diagram illus- 
trating how physical connections are established in 
the MLAN of Figure 3 for physically connecting first 
and second workstations for a two-party videocon- 
5 ference call. 

Rgure 25 is a block and schematic diagram illus- 
trating how physical connections are established in 
MLANs such as illustrated in Figure 3. for a two- 
party call between a first CMW located at one site 
10 and a second CMW located at a remote site. 

Rgures 26 and 27 are block and schematic dia- 
grams illustrating how conference bridging is pro- 
vided in the MLAN of Figure 3. 
Rgure 28 diagrammatically illustrates how a snap- 
15 shot with annotations may be stored in a plurality of 
bitmaps during data sharing. 
Rgure 29 Is a schematic and diagrammatic illustra- 
tion of the interaction among multimedia mail 
(MMM), multimedia call/conference recording 
20 (MMCR) and multimedia document management 
(MMDM) facilities. 

Figure 30 is a schematic and diagrammatic illustra- 
tion of the multimedia document architecture 
employed in an embodiment of the invention. 
25 Figure 31 A illustrates a centralized Audio/Video 
Storage Sender. 

Rgure 31B is a schematic and diagrammatic illus- 
tration of the interactions between the AudioA/tdeo 
Storage Server and the remainder of the CMW Sys- 
30 tem. 

Rgure 31C iliusti-ates an alternative embodiment of 
ttie interactions illustrated in Figure 31 B. 
Rgure 3 ID is a schematic and diagrammatic lllus- 
ti^ation of tiie integration of MMM, MMCR and 
35 MMDM facilities in an embodiment of the invention. 
Rgure 32 Illustrates a generalized hardware imple- 
mentation of a scalak>le AudioA/ideo Storage 
Server. 

Figure 33 Illustrates a higher throughput version of 
40 the server illustrated in Figure 32, using SCSI- 
based crosspoint switching to increase the number 
of possit^le simultaneous file transfers. 
Figure 34 lllusti'ates the resulting multimedia collab- 
oration environment achieved by tiie integration of 
45 audioAndeo/data teleconferencing and MMCR, 
MMM and MMDM. 

Rgures 35-42 illustrate a series of CMW screens 
which may be generated during operation of the 
present invention for a typical scenario involving a 
50., . rennote expert who takes advantage of many of tiie 
features provided by the present Invention. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

55 

OVERALL SYSTEM ARCHITECTURE 

[GDI 6] Referring initially to Figure 1 , illustrated therein 
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is an overall diagrammatic view of a multimedia collabo- 
ration system In accordance with the present invention. 
As shown, each of a plurality of "multimedia local area 
networks" (MLANs) 10 connects, via lines 13. a plurality 
of CMWs 12-1 to 12-10 and provides audio/video/data 
networking for supporting collaboration among CMW 
users. WAN 15 in turn connects multiple MLANs 10, and 
typically includes appropriate combinations of common 
canrier analog and digit transmission networks. Multiple 
MLANs 10 on the same physical premises may be con- 
nected via bridges/routes 11 . as shown, to WANs and 
one another. 

[0017] In accordance with the present invention, the 
system of Figure 1 accommodates both "real time" 
delay- and jitter-sensitive signals (e.g., real-time audio 
and video teleconferencing) and classical asynchro- 
nous data (e.g.. data control signals as well as shared 
textual, graphics and other media) communication 
among multiple CMWs 12 regardless of their location. 
Although only ten CMWs 12 are illustrated in Figure 1 , it 
will be understood that many more could be provided. 
As also indicated in Figure 1 . various other multimedia 
resources 16 (e.g.. VCRs, laserdiscs, TV feeds, etc.) 
are connected to MLANs 10 and are thereby accessible 
by individual CMWs 12. 

[0018] CMW 12 in Figure 1 may use any of a variety 
of types of operating systems, such as Apple System 7. 
UNIX, DOSA/Vindows and OS/2. The CMWs can also 
have different type§ of window systems. Specific 
eni^odiments of a CMW 12 are described hereinafter in 
conriection with Figures 18A and 18B. Note that this 
invention allows for a mix of operating systems and win- 
dow systems across individual CMWs. 
[0019] CMW 12 provides real-time audio/video/data 
capabilities along with the usual data processing capa- 
t»ilities provided by Its operating system. For example, 
Fig. 2Ajllustrates a CMW screen containing live, full- 
motion video of three conference participants, while Fig- 
ure 2B illustrates data and shared annotated by those 
conferees {tower left window). CMW 12 provides for 
bidirectional 'communication, via lines 13, within MLAN 
10, for audio/video signals as wdl as data signals. 
Audio/video signals transmitted from a CMW 12 typi- 
cally comprise a high-quality five video image and audio 
of the CMW operator. These signals are obtained from 
a video camera and microphone provided at the CMW 
(via an add-on unit or partially or totally integrated into 
the CMW)t processed, and then nnade available to low- 
cost network transmission subsystems. 
[0020] Audio/vicJeo signals received by a CMW 12 
from MLAN 10 may typically include: video images of 
one or more conference participants and associated 
audio, video and audio from multimedia mail, previously 
recorded, audio/video from previous calls and confer- 
ences., and standard broadcast television (e.g.. CNN). 
Received video signals are displayed on the CMW 
screen or on an adjacent monitor, and the accompany- 
ing audio is reproduced by a speaker provided in or near 



the CMW. In general, the required transducers and sig- 
nal processing hardware could be integrated into the 
CMW, or be provided via a CMW add-on unit, as appro- 
priate. 

5 [0021 ] In the preferred embodiment, it has been found 
particularly advantageous to provide the above- 
described video at standard NTSC-quality TV perform- 
ance (i.e., 30 frames per second at 640x480 pixels per 
frame and the equivalent of 24 bits of color per pixel) 

10 with accompanying high-fidelity audio (typically 
between 7 and 15 KHz). 

MULTIMEDIA LOCAL AREA NETWORK 

75 [0022] Referring next to Figure 3, illustrated therein Is 
a preferred embodiment of MLAN 10 having ten CMWs 
(12-1 .--12-10), coupled therein via lines 13a and 13b. 
MLAN 10 typically extends over a distance from a few 
hundred feet to a few miles, and is usually located within 

20 a building or a group of proximate buildings. 

[0023] Given the current state of networking technol- 
ogies, it is useful (for the sake of maintaining quality and 
minimizing costs) to provide separate signal paths for 
real-time audio/video and classical asynchronous data 

25 communications (including digitized audio and video 
enclosures of multimedia mall messages that are free 
from real-time delivery constraints). At the moment, 
analog methods for carrying real-time audio/video are 
preferred. In the future, digital methods may be used. 

30 Eventually, digital audio and video signal paths may be 
multiplexed with the data signal path as a common dig- 
ital stream. Another alternative is to multiplex real-time 
and asynchronous data paths together using analog 
multiplexing methods. For the purposes of illustration, 

35 however, these two signal paths are treated as using 
physically separate wires. Further, as this embodiment 
uses analog networking for audio and video, it also 
physically separates the real-time and asynchronous 
switching vehicles and, in particular, assumes an ana- 

40 log audioA^ideo switch. In the future, a common switch- 
ing vehicle (e.g., ATM) could be used. 
[0024] The MLAN 10 thus can be Implemented in the 
preferred embodiment using conventional technology, 
such as typical Data LAN hubs 25 and A/V Switching 

45 Circuitry 30 (as used in television studios and other 
closed-circuit television networks), linked to the CMWs 
12 via appropriate transceivers arid unshielded twisted 
pair (UTP) wiring. Note in Figure 1 that lines 13, which 
interconnect each CMW 12 within- its respective MLAN 

so 10. comprise two sets of lines 13a and 13b. Lines 13a 
provide bidirectional communication of audio/video 
within MLAN 10, while lines 13b provide for -the bidirec- 
tional communication of data. This separation permits 
conventional LANs to be used for data communications 

55 and a supplemental network to be used for audioArideo 
communications. Although this separation is advanta- 
geous in the prefered emtxxJiment, it is again to be 
understood that audio/video/data networking can also 



9 



EP0 898 424 A2 



10 



be implemented using a single pair of tines for both 
audio/video and data communications via a very wide 
variety of analog and digital multiplexing schemes. 
[0025] While lines 13a and 13b may be implemented 
in various ways, it is currently prefeaed to use com- 5 
nfK}nly installed 4-pair UTP telephone wires, wherein 
one pair is used for incoming video with accompanying 
audio (mono or stereo) multiplexed in, wherein another 
pair is used for outgoing multiplexed audio/video, and 
wherein the remaining two pairs are used for carrying io 
incoming and outgoing data in ways consistent with 
existing LANs. For example, lOBaseT Ethernet uses 
RJ-45 pins 1.2, 4. and 6. leaving pins 3, 5. 7. and 8 
available for the two AA/ twisted pairs. The resulting 
system Is compatible with standard (AT&T 258A, i5 
EIA/TIA 568. 8P8C. lOBaseT ISDN, 6P6C. etc.) tele- 
phone wiring found commonly throughout telephone 
and LAN cable plants in most office buildings through- 
out the world. These UTP wires are used in a hierarchy 
or peer arrangements of star topologies to create M LAN 20 
1 0. described below. Note that the distance range of the 
data wires often must match that of the video and audio. 
Various UTP-compatible data LAN networks may be 
used, such as Ethernet, token ring, FDDI. ATM, etc. For 
distances longer than the maximum distance specified 25 
by the data LAN protocol, data signals can be additon- 
ally processed for proper UTP operations. 
[0026] As shown in Figure 3. lines 13a from each 
CMW 12 are coupled to a conventional Data LAN hub i 
25, which facilitates the communication of datai (includ- 30 
ing control signals) among such CMWs. Lines 13b in 
Rgure 3 are connected to AA/ Switching Circu'rtry 30. 
One or more conference bridges 35 are coupled to AA^ 
Switching Circuitry 30 and possibly (if needed) the Data . 
LAN hub 25. via lines 35b and 35a, respectively, -for pro- ' 35 
viding nnulti-party conferencing in a particularly advdn-' 
tageous manner, as will hereinafter be described in .'i. 
detail. A WAN gateway 40 provides for bidirectlonat i 
communication between I^LAN 10 and WAN 15 in; Fig- . 
ure 1. f=or this purpose. Data LAN hub 25 and A/V 40 
Switching Circuitry 30 are coupled to WAN gateway 40: : 
via outputs 25a and 30a, respectively. Other devices 
connect to the A/V Switching Circuitry 30 and Data LAN 
hub 25 to add additional features (such as multimedia 
mail, conference recording, etc.) as discussed below. 45 
[0027] Control of A/V Switching Circuitry 30. confer- 
ence bridges 35 and WAN gateway 40 in Figure 3 is 
provided by MLAN Sender 60 via lines 60b. 60c. ar>d " ^ 
60d, respectively. In one^embodiment. MLAN Server ,60 / 
supports the TCP/IP network protocol suite. Accbrd- so 
ingly, software processes orv,CM\/ys 12 communicate 
with one another and MLAN Server 60 via MLAN 10 j 
using these protocols. Other network protocols dould 
also be used, such as IPX. The manner in which soft-; > ' 
ware running on MLAN Server 60 controls ttie operation ■ i ss 
of MLAN 10 will be desaibed in detail hereinafter. - . i 
[0028] Note in Figure 3 that Data LAN: hub 25. A/Vi : . 
Switching Circuitry 30 and MLAN Server 60 also pro- 



vide respective lines 25b. 30b. and 60e for coupling to 
additional multimedia resources 16 (Figure 1), such as 
multimedia document management, multimedia data- 
bases, radio/TV channels, etc. Data LAN hub 25 (via 
bridges/routers 11 in Rgure 1) and /W Switching Cir- 
cuitry 30 additionally provide lines 25c and 30c for cou- 
pling to one or more other MLANs 10 which may be in 
tiie same locality (i.e.. not far enough away to require 
use of WAN technology). Where WANs are required. 
WAN gateways 40 are used to provide highest quality 
compression methods and standards in a shared 
resource fashion, thus minimizing costs at the worksta- 
tion for a given WAN quality level, as discussed below. 
[0029] The basic operation of the preferred embodi- 
ment of the resulting collaboration system shown in Fig- 
ures 1 and 3 will next be considered. Important features 
of the present invention reside in providing not only 
multi-party real-time desktop audio/video/data telecon- 
ferencing among geographically distritxrted CMWs, but 
also in providing from the same desktop 
audio/video/data/text/graphics mail capabilities, as welt 
as access to other resources, such as databases, audio 
and video files, overview cameras, standard TV chan- 
nels, etc. Fig. 2B illustrates a CMW screen showing a 
multimedia EMAIL nr^ilbox (top left window) containing 
references to a number of received messages along 
witii a video enclosure (top right window) to the selected 
message. 

[0030] Returning to Figures 1 and 3. A/V Switching 
Orcuitry 30 (whether digital or analog as in the pre- 
ferred embodiment) provides common audio/video 
switching for CMWs 12, conference bridges 35. WAN 
gateway 40 and multimedia resources 16. as deter- 
mined by MLAN Server 60. which in turn controls con- 
ference bridges 35 and WAN gateway 40. Similarly, 
asynchronous data is communicated within MLAN 10 
utilizing common data communications formats where 
possible (e.g., for snapshot sharing) so that the system 
can handle such data in a comnrK>n manner, regardless 
of origin, thereby facilitating multimedia mail and data 
sharing as well as audio/video communications. 
[0031 ] For example, to provide multi-party teleconfer- 
encing, an initiating CMW 12 signals MLAN Server 60 
via Data LAN hub 25 Identifying tiie desired conference 
participants. After detemiining which of tiiese conferees 
will accept tiie call. MLAN Server 60 controls A/V 
Switching Circuitry 30 (and CMW software via tiie data 
network) to set up the required audio/video and data 
paths to conferees at the same location as the initiating 
CMW. 

[0032] When one or more conferees are at distant 
locations, the respective MLAN Sen/ers 60 of tiie 
involved MLAN*s 10, on a peer-to-peer basis, conti-ol 
tiieir respective A/V Switching Circuitry 30. conference 
bridges 35. and WAN gateways 40 to set up appropriate 
communication paths (via WAN 15 in Figure 1) as 
required for interconnecting the conferees. MLAN Serv- 
ers 60 also communicate with one another via data 
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paths so that each MLAN 10 contains updated informa- 
tion as to the capabilities of all of the system CMWs 12, 
and also the current locations of all parties available for 
teleconferencing. 

[0033] The data conferencing component of the 
above-described system supports the sharing of visual 
information at one or more CMWs (as described in 
greater detail below). This encompasses both snap- 
shot sharing" (sharing "snapshots" of complete or par- 
tial screens, or of one or more selected windows) and 
"plication sharing" (sharing both the control and dis- 
play of running applications). When transfemng images, 
lossless or slightly lossy image compression can be 
used to reduce network bandwidth requirements and 
user-perceived delay while maintaining high image 
quality. 

[0034] In all cases, any participant can point at or 
annotate the shared data. These associated telepoint- 
ers and annotations appear on every participant's CMW 
screen as they are drawn (i.e.. effectively in real time). 
For example, note Figure 2B which illustrates a typical 
CM W screen during a multi-party teleconferencing ses- 
sion, wherein the screen contains annotated shared 
data as well as video images of the conferees. As 
desaibed in greater detail below, all or portions of the 
audio/video and data of the teleconference can be 
recorded at a CMW (or within MLAN 10), complete with 
all the data interactions. 

[0035] In the above-described preferred errbodiment, 
audio/video file services can be innplemented either at 
the individual CMWs 12 or by employing a centralized 
audio/video storage server. This is one exanple of the 
many types of additional servers that can be added to 
the basic system of MLANs 10. A similar approach is 
used for Incorporating other multimedia services, such 
as commercial TV channels, multimedia mail, multime- 
dia document managemertt, multimedia conference 
recording, visualization servers, etc. (as described in 
greater detail below). Certainly, applications that run 
self-contained on a CMW can be readily added, but the 
invention extends this capability greatly in the way that 
MLAN 10, storage and other functions are implemented 
and leveraged. 

[0036] in particular, standard signal formats, network 
interfaces, user interface messages, and call models 
can allow virtually any multimedia resource to be 
smoothly integrated into the system. Factors facilitating 
such smooth integration include: (i) a common mecha- 
nism for user access across the network; (ii) a common 
metaphor (e.g., placing a call) for the user to initiate use 
of such resource; (iii) the ability for one function (e.g., a 
multimedia conference or multimedia database) to 
access and exchange information with another function 
(e.g.. multimedia mail); and (iv) tiie ability to extend 
such access of one networked function by another net- 
worked function to relatively complex nestings of sim- 
pler functions (for example, record a multimedia 
conference in which a group of users has accessed mul- 



timedia mail messages and transferred them to a multi- 
media database, and tiien send part of the conference 
recording just created as a new multimedia mail mes- 
sage, utilizing a multimedia mail editor if necessary). 
5 [0037] A simple example of the snraoth integration of 
functions made possible by the above-described 
approach is tinat tiie GUI and software used for snap- 
shot sharing (described below) can also be used as an 
input/output interface for multimedia mail and more gen- 
re eral forms of multimedia documents. This can be 
accomplished by structuring the interprocess communi- 
cation protocols to be uniform across all these applica- 
tions. Mae complicated examples — specifically 
multimedia conference recording, multimedia mail and 
15 multimedia document management — will be presented 
in detail below. 

WIDE AREA NETWORK 

20 [0038] Next to be descrifcjed in connection with Figure 
4 is the advantageous manner in which the present 
invention provides for real-time audio/video/data com- 
munication among geographically dispersed MLAN's 10 
via WAN 15 (Figure 1). whereby communication delays. 

25 cost and degradation of video quality are significantiy 
minimized from what would othenA^ise be expected. 
[0039] Four MLANs 10 are illustrated at locations A, B. 
C and D. CMWs 12-1 to 12-10, A/V Switching Circuitry 
30. Data LAN hub 25, and WAN gateway 40 at each 

30 location correspond to those shown in Figures 1 and 3. 
Each WAN gateway 40 in Figure 4 will be seen to com- 
prise a routerA:odec (R&C) bank 42 coupled to WAN 1 5 
via WAN switching multiplexer 44. The router is used for 
data interconnection and the codec is used for 

35 audio/video interconnection (for multimedia mail and 
document transmission, as well as videoconferencing). 
Codecs from multiple vendors, or supporting various 
compression algorithms may be employed. In the pre- 
ferred errrixxjiment, the router and codec are combined 

40 with tiie switching multiplexer to form a single integrated 
unit. ; : 

[0040] Typically. WAN 15 is comprised of T1 or ISDN 
common-carrier-provided digtal links (switched or dedi- 
cated), in which case WAN switching multiplexers 44 

45 are of the appropriate type (T1 , ISDN, fractional T1 . T3, 
switched 56 Kbps. etc.). Note tiiat tiie WAN switching 
rnultiplexer 44 typically creates sut)channels whose 
bandwidth is a multiple of 64 Kbps (i e.. 256 Kbps. 384, 
768, etc.) among tiie T1, T3 or ISDN can'iers. Inverse 

50 multiplexers may be required when using 56 Kbps ded- 
icated or switched services from these carriers. 
[0041] In tiie MLAN 10 to WAN 15 direction, 
router/codec bank 42 in Figure 4 provides conventional 
analog-to-digital conversion and compression of 

55 audio/video signals received from AN SwitcNng Cir- 
cuitry 30 for transmission to WAN 15 via WAN switching 
multiplexer 44, along witii transmission and muting of 
data signals received from Data LAN hub 25. In tiie 
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WAN 1 5 to MLAN 10 direction, each router/codec bank 
42 in Figure 4 provides digital-to-analog conversion and 
deconpression of audio/video digital signals received 
from WAN 15 via WAN switching multiplexer 44 for 
transmission to AA/ Switching Circuitry 30. along with 
the transmission to Data LAN hub 25 of data signals 
received from WAN 15. 

[0042] The system also provides optimal routes for 
audio/video signals through the WAN. For example, in 
Figure 4, location A can take either a direct route to 
location D via path 47. or a two-hop route through loca- 
tion C via paths 48 and 49. If the direct path 47 linking 
location A and location D Is unavailable, the muttipath 
route via location C and paths 48 and 49 could be used. 
[0043] In a more corrplex network, several multi-hop 
nrujies are typically available, in which case the routing 
system handles the decision making, which for example 
can be based on network loading considerations. Note 
the resulting two-level network hierarchy: a MLAN 10 to 
MLAN 10 (i.e. . srte-to-srte) service connecting codecs 
with one another only at connection endpoints. 
[0044] The cost savings made possit)le by providing 
the above-described multi-hop capability (with interme- 
diate codec bypassing) are very significant as will 
become evident by noting the examples of Figures 5 
and 6. Figure 5 shows that using the conventional "fully 
connected mesh" location-to-k>cation approach, thirty- 
six WAN links are required for interconnecting the nine 
locations L1 to LB. On the other hand, using the above 
multi-hop capabilities, only nine WAN links are required, 
as shown in Figure 6. As the number of locations 
increase, the difference in cost becomes even greater. 
For exanple, for 100 locations, the conventional 
approach would require about 5.000 WAN links, while 
the multi-hop approach of the present invention would 
typically require 300 or fewer (possibly considerably 
fewer) WAN links. Although specific WAN links for the 
multi-hop approach of the invention would require 
high^ bandwidth to carry the additional traffic, the cost 
involved is very nxjch smaller as conrpared to the cost 
for the very much larger nun^er of WAN links required 
by the conventional approach. 
[0045] At the endpoints of a wide-area call, the WAN 
switching multiplexer routes audioA^deo signals directly 
from the WAN network interface through an available 
codec to MLAN 10 and vice versa. At intermediate hops 
in the network, however, video signals are routed from 
one network interface on the WAN switching multiplexer 
to another network interface. Although A/V Switching 
Circuitry 30 could be used for this purpose, the pre- 
ferred embodiment provides switching functionality 
inside the WAN switching multiplexer. By doing so, it 
avoids having to route audio/video signals through 
codecs to the analog switching circuitry, thereby avoid- 
ing additional ' codec delays at tiie intermediate loca- 
tions. 

[0046] A product capable of performing the basic 
switching functions described at^ove for WAN switching 



multiplexer 44 is available from Teleos Corporation, 
Eatontown, New Jersey (U.S.A.). This product is not 
known to have been used for providing audio/video 
multi-hopping and dynamic switching among vanous 

5 WAN links as described above. 

[0047] In addition to the above-described multiple-hop 
approach, the present inventbn provides a particularly 
advantageous way of minimizing delay, cost and degra- 
dation of video quality in a multi-party video teleconfer- 

10 ence involving geographically dispersed sites, while still 
delivering full conference views of all participants. Nor- 
mally, in order for the CMWs at all sites to be provided 
witii live audio/video of every participant in a teleconfer- 
ence simultaneously, each site has to allocate (in 

15 router/codec bank 42 in Figure 4) a separate codec for 
each participant, as well as a like number of WAN 
t-unks (via WAN switching multiplexer 44 in Figure 4). 
[0048] As will next be described, however, tiie pre- 
ferred enr^xxjiment of the invention advantageously per- 

so mits each wide area audioMdeo teleconference to i^e 
only one codec at each site, and a minimum number of 
WAN digital ti-unks. Basically, the preferred embodiment 
achieves this most important result by employing "dis- 
ti'ibuted" video mosaicing via a video "cut-and-paste" 

25 technology along witii distributed audio mixing. 

DISTRIBUTED VIDEO MOSAICING 

[0049] Figure 7 illustrates a prefen-ed way of providing 
30 video mosaicing in the MLAN of Figure 3 - i.e., by com- 
bining the individual analog video pictures from the indi- 
viduals participating in a teleconference into a single 
analog mosaic picture. As shown In Figure 7, analog 
video signals 1 12-1 to 112-n from tiie participants of a 
35 teleconference are ap|:^ied to video mosaicing circuitry 
36, which in tiie preferred embodiment is provided as 
part of conference bridge 35 in Figure 3. These analog 
video inputs 112-1 to 112-n are obtained from the A/V 
Switching Circuitry 30 (Figure 3) and may include video 
40 signals from CMWs at one or mae distant sites 
(received via WAN gateway 40) as. well as from other 
CMWs at the local site. 

[0050] Video mosaicing circuitry, 36. represented by 
bkx;k is capable of receiving N individual analog video 

45 picture signals (where N is a squared integer, i.e.. 4. 9. 
1 6. etc.). Circuitry 36 first reduces the size of the N input 
video signals by reducing the resolutions of each by a 
factor of M (where M is the square root of N (i.e.. 2. 3. 4. 
etc.), and tiien arranging them in an M-by-M mosaic of 

50 N images. ; The resulting single, analog mosaic 36a 
obtained from video mosaicing circuiti-y 36 is tiien trans- 
mitted to the individual CMWs for display on the screens 
thereof. 

[0051] As will become evident hereinafter, it may be 
55 preferable to send a different mosaic to distant sites, in 
which case video mosaicing circuitry 36 would provide 
an additional mosaic 36b fortiiis purpose. A typical dis- 
played nriosaic picture (N=4, M=2) showing three partic- 
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ipants is illustrated in Figure 2A. A mosaic containing 
four participants is shown in Figure 8B. It will be appre- 
ciated that, since a mosaic (36a or 36b) can be transmit- 
ted as a single video picture to an other site, via WAN 1 5 
(Figures 1 and 4). only one codec and digital trunk are 
required. Of course, if only a single individual video pic- 
ture is required to fc>e sent from a site, it may be sent 
directly without being included in a mosaic. 
[0052] Note that for large conferences it is possible to 
employ multiple video mosaics, one for each video win- 
dow supported by the CMWs (see, e.g., Figure 8C). In 
very large conferences, it is also possible to display 
video only from a select focus group whose members 
are selected by a dynamic lloor control" mechanism. 
Also note that, with additional mosaic hardware, it is 
possible to give each CMW its own mosaic. This can be 
used in small conferences to raise the maximum 
number of participants (from to + 1 - i.e., 5, 10, 
17, etc.) or to give everyone in a large conference their 
own locus group" view. 

[0053] Also note that the entire video nrK>saicing 
approach described thus far and continued below 
applies should digital video transmission be used in lieu 
of analog transmissbn. particularly since both mosaic 
and video window implementations use digital formats 
internally and in current products are transformed to 
and from analog fa external interfacing. In particular, 
note that mosaicing can be done digitally without 
decompression with many existing corrpression 
schemes. Further, with an all-digital approach, mosaic- 
ing can be done as needed directly on the CMW. 
[0054] Rgure 9 illustrates audio mixing circuitry 38. 
represented by block for use in conjunction with the 
video mosaicing circuitry 36 in Figure 7, both of which 
may be part of conference bridges 35 in Figure 3. As 
shown in Figure 9, audio signals 114-1 to 114-n are 
applied to audio summing circuitry 38 for corr^ination. 
These input' audio signals 114-1 to 114-n may include 
audio signals from local participants as well as audio 
sums from participants at distant sites. Audio mixing cir- 
cuitry 38 provides a respective "minus-r sum output 
38-1 . 38a-2, etc. for each participant. Thus, each partic- 
ipant hears every conference participant's audio exc^t 
his/her own. 

[0055] In the preferred embodiment sums are decom- 
posed and formed in a distributed fashion, creating par- 
tial sums at one site which are completed at other sites 
by appropriate signal insertion. Accordingly, audio mix- 
ing circuitry 38 is able to provide one or more additional 
sunns, such as indicated by output 38. for sending to 
- other sites having conference participants. 
[0056] Next to be considered is the manner in which 
video cut-and-paste techniques are advantageously 
employed in the prefen-ed embodiment It will be under- 
stood that, since video mosaics and/or individual video 
pictures may be sent from one or more other sites, the 
problem arises as to how these situations are handled. 
Vio cut-and-paste circuitry 39, as illustrated in Figure 



10, is provided for this purpose, and may also be incor- 
porated in the conference bridges 35 in Figure 3. 
[0057] Referring to Figure 10, video cut-and-paste cir- 
cuitry 39 eives analog video inputs 1 16, which may be 

5 comprised of one or more mosaics or single video pic- 
tures received from one or more distant sites and a 
mosaic or single video picture produced by the local 
site. It is assumed that the local video mosaicing cir- 
cuitry 36 (Figure 7) and the video cut-and-paste circuitry 

10 39 have the capability of handling all of the applied indi- 
vidual video pictures; or at least are able to choose 
which ones are to be displayed based on existing avail- 
able signals. 

[0058] The video cut-and-paste circuitry 39 digitizes 

15 the incoming analog video inputs 116. selectively rear- 
ranges the digital signals on a region-by-region basis to 
produce a single digital l^-by-M mosaic, haying individ- 
ual pictures in selected regions, and then converts the 
resulting digital mosaic back to analog form to provide a 

20 single analog mosaic picture 39a for sending to local 
participants (and other sites where required) having the 
individual input video pictures in appropriate regions. 
This resulting cut-and-paste analog mosaic 39a will pro- 
vide the same type of display as illustrated in Figure 8B. 

25 As will become evident hereinafter, it is sometimes ben- 
eficial to send different cut-and-paste mosaics to differ- 
ent sites, in which case video cut-and-paste circuitry 39 
will provide additional cut-and-paste mosaics 39t>-1. 
39b-2. etc. for this purpose. 

30 [0059] Figure 1 1 diagrammatically illustrates an exam- 
ple of how video cut-and-paste circuitry may operate to 
provide the cut-and-paste analog mosaic 39a. As 
shown in Figure 11, four digitized individual signals 
116a, 116b, 116c derived from the input video signals 

35 are "pasted** into selected regions of a digital frame 
buffer 1 7 to form a digital 2x2 mosaic, which is con- 
verted into an output analog video mosaic 39a or 39b in 
Figure 10. The required audio partial sums may be pro- 
vided by audio mixing circuitry 39 in Figure 9i in the 

40 same manner, replacing each cut-and-paste ' video 
operation with a partial sum operation. 
[0060] Having desCTibed In connection with Figures 7- 
1 1 how video mosaicing, audio mixing, video cut-and- 
pasting, and distributed audio mixing may be per- 

45 formed, the following description of Figures 12r17 will 
illustrate how these capabilities may advantageously be 
used in conrtbination in the context of wide-area video- 
conferencing. For these exanrples, the teleconference is 
assumed to have four participants designated as A; B. C 

so and D, in which case 2x2 (quad) mosaics are employed. 
It is to be understood that greater numbers of partici- 
pants could be provided. Also, two or more simultane- 
ously occurring teleconferences could also be handled, 
in which case additional mosaicing, cut-and-paste and 

55 audio mixing circuitry would be provided at the various 
sites along with additional WAN paths. For each exam- 
ple, the "A" figure illustrates the video mosaicing and 
cut-and-pasting provided, and the con-esponding "B" 
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figure (having the same figure number) illustrates the 
associated audio mixing provided. Note that these fig- 
ures indicate typical delays that might be encountered 
for each example (with a single "UNIT delay ranging 
from 0-450 milliseonds. depending upon available com- 5 
pression technology). 

[0061] Figures 12Aand 12B illustrate a 2-site example 
having two participants A and B at Site #1 and two par- 
ticipants C and D at Site #2. Note that this example 
requires mosaicing and cut-and-paste at both sites. 10 
[0062] Figures 13A and 138 illustrate another 2-site 
exanple, but having three participants A, B and C at 
Site #1 and one participant D at Site #2. Note that this 
exanple requires mosaicing at both sites, but cut-and- 
paste only at Site #2. 75 
[0063] Figures 1 4A and 1 4B illustrate a 3-site example 
having participants A and B at Site #1 , participant C at 
Site #2, and participant D at Site #3. At Site #1. the 
three local videos A, B and C are put into a mosaic 
vyrhich is sent to both Site #2 and Site #3. At Site #2 and 20 
Site #3, cut-and-paste is used to insert the single video 
(C or D) at that site into the enrtpty region in the imported 
A, B. C mosaic, as shown. Accordingly, mosaicing is 
required at all three sites, and cut-and-paste is required 
for only Site #2 and site #3. 25 
[0064] Figures 15A and 15B illustrate another 3-site 
example having participant A at Site #1 . participant B at 
Site #2. and participants C and D at Site #3. Note that 
mosaicing and cut-and-paste are required at all sites. 
Site #2 additionally has the capability to send different 30 
cut-and-paste mosaics to Sites #1 and Sites #3. Further 
note wrth respect to Figure 15B that Site #2 creates 
minus-1 audio mixes for Site #1 and Site #2, but only 
provides a partial audio mix (A&B) for Site #3. These 
partial mixes are completed at Site #3 by mixing in C's 35 
signal to complete D's mix (A+B+C) and D*s signal to 
complete .C's mix (A+B+D). 

[OOSSJc Figure 16 illustrates a 4-site exanple enploy- 
ing a star: topology, tiaving one participant at each site; 
that is, participant A is at Site #1 , participant 6 is at Site 40 
#2, participant C is at Site #3, and participant D is at 
Site #4. An audio implementation is not illustrated for 
this example, since standard minus-1 mixing can be 
performed at Site #1, and the appropriate sums trans- 
mitted to the other sites. 45 
[0066] Figures 1 7A and 1 7B Illustrate a 4-srte example 
that also has only^ one participant at each site, but uses 
a lir^e :topology rather than a star topology as in the 
exanple of Figure* 16. Note that this exanple requires 
mosaicing and cut-$nd-paste at all sites. Also note that so 
Site #2 and Site #3 are each required to transmit two dif- 
ferent types of cut-ahd-paste mosaics. 
[0067]^ The preferred embodiment also provides the 
capatMlity of allowing a conference participant to select 
a close-up of a participant displayed on a mosaic. This ss 
capability is prc^ded whenever a full individual video 
pictul'e isiavailable at tiiat user's site. In such case, the 
AA/ 'Switching Circuitry 30 (Figure 3) switches the 
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selected full video picture (whether obtained locally or 
from another site) to the CMW that requests the close- 
up. 

[0068] Next to be described in connection with Figures 
18A. 18B. 19 and 20 are various enrtxxjiments of a 
CMW in accordance witii the invention. 

COLLABORATIVE MULTIMEDIA WORKSTATION 
HARDWARE 

[0069] One embodiment of a CMW 1 2 of tiie present 
invention is illustrated in Fig. 18A. Currently available 
personal conputers (e.g.. an Apple Macintosh or an 
IBM-compatible PC. desktop or laptop) and worksta- 
tions (e.g.. a Sun SPARCstation) can be adapted to 
work with the present invention to provide such features 
as real-time videoconferencing, data conferencing, mul- 
timedia mail. etc. In business situations, it can be 
advantageous to set up a laptop to operate witii 
reduced functionality via cellular telephone links and 
rennovable storage media (e.g., CD-ROM. video tape 
with timecode support, etc.), but take on full capability 
back in tiie office via a docking station connected to the 
MLAN 10. This requires a voice and data modem as yet 
another function server attached to the MLAN. 
[0070] The currentiy available personal computers 
and workstations serve as a base workstation platform. 
The addition of certain audio and video I/O devices to 
tiie standard components of the base platform 100 
(where standard conponents include the display moni- 
tor 200, keytx)ard 300 and mouse or tablet (or other 
pointing device) 400), all of which connect witii the base 
platform box through standard peripheral ports 101, 102 
and 103, enables the CMW to generate and receive 
real-time audio and video signals. These devices 
include a video camera 500 for capturing the user*s 
image, gestures and surroundings (particularly the 
user's face and upper body), a microphone 600 for cap- 
turing tiie user's spoken words (and any other sounds 
generated at the CMW), a speaker 700 for presenting 
incoming audio signals (such as tiie spoken words of 
another participant to a videoconference or audio anno- 
tations to a document), a video input card 130 in tiie 
base platform 100 for capturing incoming video signals 
(e.g., the image of anotiier participant to a videoconfer- 
ence. or videomail), and a video display card 120 for 
displaying video and graphical output on monitor 200 
(where video is typically displayed in a separate win- 
dow). 

[0071] These peripheral audio and video I/O devices 
are readily available from a variety of vendors and are 
just beginning to become standard features in (and 
often physically integrated into the monitor and/or base 
platform of) certain personal conputers and worksta- 
tions. See. e.a. . the aforementioned BYTE article 
("Video Conquers tiie Desktop"), which describes cur- 
rent models of Apple's Macintosh AV series personal 
computers and Silicon Graphics' Indy workstations. 
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[0072] Add-on box 800 (shown in Fig. 18A and illus- 
trated in greater detail in Fig. 19) integrates these audio 
and video I/O devices with additional functions (such as 
adaptive echo canceling and signal switching) and inter- 
faces with AV Network 901 . AV Network 901 is the part 5 
of the MLAN 10 which carries bidirectional audio and 
video signals among the CMWs and AA/ Switching Cir- 
cuitry 30 — e.g., utilizing existing UTP wiring to can-y 
audio and video signals (digital or analog, as in the 
present embodiment). to 
[0073] In the present embodiment, the AV network 
901 is separate and distinct from the Data Network 902 
portion of the MLAN 1 0, which cames bidirectional data 
signals among the CMWs and the Data LAN hub (e.g., 
an Ethernet network that also uses UTP wiring in the is 
present embodiment with a network interface card 1 10 
in each CMW). Note that each CMW will typically be a 
node on both the AV and the Data Networks. 
[0074] There are several approaches to inplementing 
Add-on box 800. In a typical videoconference, video 20 
camera 500 and miaophone 600 capture and transmit 
outgoing video and audio signals into ports 801 and 
802, respectively, of Add-on box 800. These signals are 
transmitted via Audio/Video I/O port 805 across AV Net- 
work 901. Incoming video and audio signals (from 25 
another videoconference participant) are received 
across AV network 901 through Audio/Video I/O port 
805. The video signals are sent out of V-OUT port 803 
of CMW add-on box 800 to video input card 130 of base 
platform 100, where they are displayed (typically in a 30 
separate video window) on monitor 200 utilizing the 
standard base platform video display card 120. The 
audio signals are sent out of A-OUT port 804 of CMW 
■add-on box 800 and played through speaker 700 while 
the video signals are displayed on monitor 200. The 35 
same signal fjow occurs for other non-teleconferencing 
applications of audio and video.: 
[0075] Add-on box 800 can be controlled by CMW 
software (illustrated in Fig. 20) executed by base plat- 
form 100. Control signals cah be communicated 4o 
between base platform port 104 and Add-on box Con- 
trol port 806 (e.g., an RS-232, Centroriics, SCSI or 
other standard communications port). 
[0076] Many other embodiments of the CMW illus- 
trated in Fig. 18A will work in accordance with the 45 
present Invention. For example. Add-on box 800 itself 
can be implemented as an add-in card to the base plat- 
form 100. Connections to the audio and video I/O 
devices need not change, though the connection for 
base platform control can be implemented internally so. 
; (e.g„ via the system bus) ratheir than through an exter- 
nal RS-232 or SCSI peripheral port Various additional 
l^ets of integration can also.be achieved as will be evi- 
dent to those skilled in .the art. For example, micro- 
; phones. S|bieakers.. video : cameras and UTP ss 
transceivers can be integrated into! th^ base platform 
; 100 itself, and allnrieclia handling .technology and com- 
hiunicatiohs cah be integrated onto a single card. 



[0077] A handset/headset jack enables the use of an 
integrated audio I/O device as an alternate to the sepa- 
rate microphone and speaker. A telephone interface 
could be integrated into add-on box 800 as a local 
implementation of computer-integrated telephony, A 
"hold" (i.e., audio and video mute) switch and/or a sep- 
arate audio mute switch could be added to /\ddon box 
800 if such an implementation were deemed preferable 
to a software-based interface. 
[0078] The internals of Add-on box 800 of Fig. 1 8A are 
illustrated in Fig. 19. Video signals generated at the 
CMW (e.g., captured by camera 500 of Fig. 18A) are 
sent to CMW add-on box 800 via V-IN port 801 . They 
then typically pass unaffected through Loopt>ack/AV 
Mute circuitry 830 via video ports 833 (input) and 834 
(output) and into A/V Transc«vers 840 (via Video In port 
842) where they are transformed from standard video 
cable signals to UTP signals and sent out via port 845 
and Audio/Video I/O port 805 onto AV Network 901 . 
[0079] The Loopback/AV Mute circuitry 830 can. how- 
ever, be placed in various modes under software control 
via Control port 806 (implemented, for example, as a 
standard UART). If in loopback mode (e.g., for testing 
incoming and outgoing signals at the CMW). the video 
signals would be routed back out V-OUT port 803 via 
video port 831. If In a mute mode (e.g., muting audio, 
video or both), video signals might, for exanple. be dis- 
connected and no video sgnal would be sent out video 
port 834. Loopback and muting switching functionality is 
also provided for audio in a similar way. Note that com- 
puter control of loopback is very useful for remote test- 
ing and diagnostics while manual oven-ide of computer 
control on nrujte is effective for assured privacy from use 
of the workstation for electronic spying. 
[0080] Video irput (e.g. . captured by the video camera 
at the CMW of another videoconference participant) is 
handled in a similar fashion. It is received along AV Net- 
work 901 through Audio/Video I/O port 805 and port 845 
of /W Transceivers 840. where it is sent out Video Out 
port 841 to video port 832 of Loopback/AV Mute cir- 
cuitry 830. which typically passes such signals out 
video port 831 to V-OUT port 803 (for receipt by a video 
input card or other display mechanism, such as LCD 
display 810 of CMW Side Mount unit 850 in Fig. 18B. to 
be discussed). 

[0081] Audio input and output (e.g., for playback 
through speaker 700 and capture by microphone 600 of 
Fig. 18A) passes through /W transceivers 840 (via 
Axxiio In port 844 and Audio Out port 843) arxi Loop- 
back/AV Mute circuitry 830 (though audio ports 
837/838 and 836/835) in a similar manner. The audio 
input and output ports of Add-on box 800 interface with 
standard amplifier and equalization circuitry, as welt as 
an adaptive room echo canceler 814 to eliminate echo, 
minimize feedt>ack and provide enhanced audio per- 
formance when using a separate microphone and 
speaker. In particular, use of adaptive room echo can- 
celers provides high-quality audio interactions in wide 
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area conferences. Because adaptive room echo cance- 
ling requires training periods (typically involving an 
objectionable blast of high-amplitude white noise or 
tone sequences) for alignment with each acoustic envi- 
ronment, it is preferred that separate echo canceling be 5 
dedicated to each workstation rather than sharing a 
snialler group of echo cancelers across a larger group 
of workstations. 

[0082] Audio inputs passing through audio port 835 of 
Loopback/AV Mute circurtry 830 provide audio signals 10 
to a speaker (via standard Echo Canceler circuitry 814 
and A-OUTport 804) or to a handset or headset (via I/O 
ports 807 and 808, respectively, under volume control 
circuitry 81 5 controlled by software through Control port 
806). In all cases, incoming audio signals pass through 15 
power amplifier circuitry 812 before being sent out of 
Add-on box 800 to the appropriate audio-emitting trans- 
ducer. 

[0083] Outgoing audio signals generated at the CMW 
(e.g., by microphone 600 of Fig. 18A or the mouthpiece 20 
of a handset or headset) enter Add-on box 800 via A-IN 
port 802 (for a microphone) or Handset or Headset I/O 
ports 807 and 808, respectively. In all cases, outgoing 
audio signals pass through standard preamplifier (811) 
and equalization (81 3) circuitry, whereupon the desired 25 
signal Is selected by standard "Select" switching cir- 
cuitry 816 (under software control through Control port 
806) and passed to audio port 837 of Loopback/AV 
Mute circuitry 830. 

[0084] It Is to be understood that A/V Transceivers 840 30 
may Include muxing/demuxing facilities so as to enable 
the transmission of audio/video signals on a single pair 
of wires, e.g., by encoding audio signals digitally in the 
vertical retrace interval of the analog video signal. 
Implementation of other audio and video enhance- 35 
ments, ^ch as stereo audio and external audio/video 
I/O* ports (e.g.. for recording signals generated at the 
CMW), are also well within the capabilities of one skilled • 
in the art. If stereo audio is, used in teleconferencing ' - 
(I.e., to create useful spatial metaphors for users), a 40 
second echo canceler may be recommisnded. 
[0085] Another embodiment of the CMW of this inven- 
tion. Illustrated In Fig. 18B, utilizes a separate (fully self- 
contained) ''Side Mount" approach which includes its 
own dedicated video display. This embodiment is 45 
advantageous in a variety of situations, such as 
instances In which additional screen display area is 
desired (e.g., in a laptop conrtputer or desktop system 
•with a small monitor) or where it is impossible or under . 
sirable to retrofit -older, existing or Specialized desktop ' so 
computers for audio/video support. this embodiment, 
video camera 500, microphone 600 and speaker 700 of 
Fig. 18A are integrated together vwth the functionality of 
Add-on box 800. Side Mount 850 eliminates the neces- 
sity of external connections to these integrated audio i . 55 
and video I/O devices, and includes an LCD display 810 ' 
for displaying the incoming video signal (which: thus 
eliminates the need for a base platform video input card 



130). 

[0086] Given the proximity of Side Mount device 850 
to the user, and the direct access to audio/video I/O 
within that device, various additional controls 820 can 
be provided at the user's touch (alt well within the capa- 
bilities of those skilled in the art). Note that, with enough 
additions, Side Mount unit 850 can become virtually a 
standalone device that does not require a separate 
computer for services using only audio and video. This 
also provides a way of supplementing a network of full- 
feature workstations with a few low-cost additional 
"audio video intercoms" fa certain sectors of an enter- 
prise (such as clerical, reception, factory floor, etc.). 
[0087] A portable laptop inplementation can be made 
to deliver multimedia mail with video, audio and syn- 
chronized annotations via CD-ROM or an add-on video- 
tape unit with separate video, audio and time code 
tracks (a stereo videotape player can use the second 
audio channel for time code signals). Videotapes or CD- 
ROMs can be created in main offices and express 
mailed, thus avoiding the need for high-bandwidth net- 
working when on the road. Cellular phone links can be 
used to obtain both voice and data communications (via 
modems). Modem-based data communications are suf- 
ficient to support remote control of nnail or presentation 
playback, annotation, file transfer and fax features. The 
laptop can then be brought into the office and attached 
to a docking station where the available MLAN 10 and 
additional functions adapted from Add-on box 800 can 
be supplied, provkjing full CMW capability. 

COLLABORATIVE MULTIMEDIA WORKSTATION 
SOFTWARE 

[0088] CMW software modules 160 are illustrated 
generally in Fig. 20 and discussed in greater detail 
below in conjunction with the software running on MLAN 
Server 60 of Fig. 3. Software 160 allows the user to ini- 
tiate; and manage (In conjunction with the server soft- 
ware) videoconferencing, data conferencing, 
multimedia n^tl and other collaborative sessions with 
other users aaoss the network. 
[0089] Also present on the CMW in this embodiment 
are standard multitasking operating system/GUI soft- 
ware 180 (e.g., Apple Macintosh System 7, Microsoft 
Windows 3.1 , or UNIX with the "X Window System" and 
Motif or other GUI "window manager" software) as well 
as other applications 1 70. such as word processing and 
spreadsheet programs. Software modules 161-168 
communicate with operating system/GUI software 180 
and other applications 170 utilizing standard function 
calls:and interapplication protocols. 
[0090] The central component of the Collaborative 
Multimedia Workstation software is the Collaboration 
Initiator 161. All collaborative functions can be 
accessed through this module. When the Collaboration 
Initiator is started, it exchanges initial configuration 
information with the Audio Video Network Manager 
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(AVNM) 60 (shown in Fig. 3) through Data Network 902. 
Information is also sent from the Collaboration Initiator 
to the AVNM indicating the location of the user, the 
types of services available on that workstation (e.g., vid- 
eoconferencing, data conferencing, telephony, etc.) and 
other relevant initialization information. 
[0091] The Collaboration Initiator presents a user 
interface that allows the user to initiate collaborative 
sessions (both real-time and asynchronous). In the pre- 
ferred embodiment session participants can be 
selected from a graphical rolodex 163 that contains a 
scrollable list of user names or from a list of quick-dial 
buttons 162. Quick-dial buttons show the face icons for 
the users they represent. In the preferred embodiment, 
the icon representing the user is retrieved by the Collab- 
oration Initiator from the Directory Server 66 on MLAN 
Server 60 when H starts up. Users can dynamically add 
new quick-dial buttons by dragging the conesponding 
entries from the graphical rolodex onto the quick-dial 
panel. 

[0092] Once the user elects to initiate a collaborative 
session, he or she selects one or more desired partici- 
pants by, for example, clicking on that name to select the 
desired participant from the system rolodex or a per- 
sonal rolodex, or by clicking on the quick-dial button for 
that participant (see. e.g.. Fig. 2A), In either case, the 
user then selects the desired session type — e.g.. by 
clicking on a CALL button to initiate a videoconference 
call, a SHARE button to initiate the sharing of a snap- 
shot image or blank whiteboard, or a MAIL button to 
send mail. Alternatively, the user can double-dick on the 
rolodex name or a face icon to initiate the default ses- 
sion type — e.g., an audio/video conference call. 
[0093] The system also allows sessions to be invoked 
from the keyt>oard. It provides a graphical editor to bind 
combinations of participants and session types to cer- 
tain hot keys. Pressing this hot key (possibly in conjunc- 
tion with a modifier key, e.g.. (Shift > or (Ctrl > ) will cause 
the Collatx)ration Initiator to start a session of the spec- 
ified type with the given participants. 
[0094] Once the user selects the desired participant 
and session type, Collaboration Initiator module 161 
retrieves necessary addressing information from Direc- 
tory Service 66 (see Fig. 21). In the case of a videocon- 
ference call, the Collaboration Initiator (or. in another 
embodiment. VideoPhone module 169) then communi- 
cates with the AVNM (as described in greater detail 
below) to set up the necessary data structures and 
manage the various states of that call, and to control 
AA/ Switching Circuitry 30. which selects the appropri- 
ate audio and video signals to be transmitted to/from 
each participant's CMW. In the case of a data confer- 
encing session, the Collaboration Initiator locates, via 
the AVNM, the Collaboration Initiator modules at the 
CMWs of the chosen recipients, and sends a message 
causing the Collaboration Initiator modules to invoke the 
Snapshot Sharing modules 164 at each participant's 
CMW. Subsequent videoconferencing and datai confer- 



encing functionality is discussed in greater detail below 
in the context of particular usage scenarios. 
[0095] As indicated previously, additional collaborative 
services — such as Mail 165, Application Sharing 166. 

5 Computer-Integrated Telephony 167 and Computer 
Integrated Fax 168 — are also available from the CMW 
by using Collaboration Initiator module 161 to initiate 
the session (i.e.. to contact the participants) and to 
invoke the appropriate application necessary to man- 

10 age the collaborative session. When initiating asynchro- 
nous collaboration (e.g., mail, fax, etc.), the 
Collaboration Initiator contacts Directory Service 66 for 
address information (e.g.. EMAIL address, fax number, 
etc.) for the selected participants and invokes the appro- 

75 priate collaboration tools with the obtained address 
information. For real-time sessions, the Collaboration 
Initiator queries the Service Server module 69 inside 
AVNM 63 for the current location of the specified partic- 
ipants. Using this location information, it communicates 

20 (via the AVNM) with the Collaboration Initiators of the 
other session participarts to coordinate session setup. 
As a result, the various Collaboration Initiators will 
invoke modules 166, 167 or 168 (including activating 
any necessary devices such as the connection between 

25 the telephone and the CMWs audio I/O port). Further 
details on multimedia mail are provided below. 

MLAN SERVER SOFTWARE 

30 [0096] Figure 21 diagrammatically illustrates software 
62 comprised of various modules (as discussed above) 
provided for running on MLAN Server 60 (Figure 3) in 
the preferred embodiment. It is to be understood that 
additional software modules could also be provided. It is 

35 also to be understood that, although the software illus- 
trated in Figure 21 offers various significant advantages, 
as will become evident hereinafter, different forms and 
. an^ngements of software may also be employed within 
the scope of the 'invention. The software can also be 

40 implemented in various sub-parts running as separate 
processes. 

[0097] In one enrtKxJiment, clients (e.g., software-con- 
trolling workstations, VCRs, laserdlsks. multimedia 
resources, etc.) communicate with the MLAN Server 

45 Software Modules 62 using the TCP/IP network proto- 
cols. Generally, the AVNM 63 cooperates writh the Serv- 
ice Sen/er 69, Conference Bridge Manager (CBM 64 in 
Figure 21) and the WAN Network Manager (WNM 65 in 
Figure 21) to manage communications within and 

50 among both MLANs 10 and WANs 15 (Figures 1 and 3). 
[0098] The AVNM additionally cooperates with 
Audio/Video Storage Server 67 and other multimedia 
services 68 in Figure 21 to support various types of col- 
laborative interactions as described herein. CBM 64 in 

55 Figure 21 operates as a client of the AVNM 63 to man- 
age conferencing by controlling the operation of confer- 
ence bridges 35. This includes management of the 
video mosaicing circuitry 37, audio mixing circuitry 38 
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and cut-and-paste circuitry 39 preferably incorporated 
therein. WNM 65 manages the allocation of paths 
(codecs and trunks) provided by WAN gateway 40 for 
accomplishing the communications to other sites called 
forbytheAVNM. 5 

Audio Video Networic Manager 

[0099] The AVNM 63 manages AA/ Switching Cir- 
cuitry 30 in Figure 3 for selectively routing audioA^ideo io 
signals to arxJ from CMWs 12. and also to and from 
WAN gateway 40, as called for by clients. Audio/video 
devices (e.g.. CMWs 12, conference bridges 35. multi- 
media resources 16 and WAN gateway 40 in Figure 3} 
connected to AA/ Switching Circuitry 30 in Figure 3, is 
have physical connections for audio in, audio out, video 
in and video out. For each device on the network, the 
AVNM combines these four connections into a port 
abstraction, wherein each port represents an addressa* 
ble bidirectional audio/video channel. Each device con- so 
nected to the network has at least one port. Different 
ports may share the same physical connections on the 
switch. For exanple. a conference bridge may typically 
have four ports (for 2x2 mosaicing) that share the same 
video-out connection. Not all devices need both video 25 
and audio connections at a port. For example, a TV 
tuner port needs only Incoming audio/video connec- 
tbns. 

[0100] In response to client program requests, the 
AVNM provides connectivity between audio/video so 
devices by connecting their ports. Connecting ports is 
achieved by switching one port's physical input connec* 
tions to the other port's physical output connections (for 
both audio and video) and vice-versa. Client programs 
can specify which of the 4 physical connections on its 35 
ports should be switched. This allows client programs to 
establish unidirectional calls (e.g., by specifying that 
only the port's input connections should be switched 
and not the port's output connections) and audio-only or 
video-only calls (by specifying audio connections only 40 
or video connections only). 

Service Server 

[01 01 ] Before client programs can access audio/video 45 
resources through the AVNM, they n^st register the col- 
laborative services they provide with the Service Server 
69. Exanples of these services indicate 'Video call", 
"snapshot sharing", "conferwe" and vkleo fine shar- 
ing." These service records are entered into the Service so 
Server's service database. The service database thus 
keeps track of the location of client programs and the 
types of collaborative sessions in which they can partic- 
ipate. This allows the Collaboration Initiator to find col- 
laboration participants no matter where they are ss 
located. The service database is replicated by all Serv- 
ice Servers: Service Servers communicate with other 
Service Servers in other MLANs throughout the system 



to exchange their service records. 
[0102] Clients may create a plurality of services, 
depending on the collaborative capabilities desired. 
When creating a service, a client can specify the net- 
work resources (e.g. ports) that will be used by this 
service. In particular, sen/ice information is used to 
associate a user with the audio/video ports physically 
connected to the particular CMW into which the user is 
logged in. Clients that want to receive requests do so by 
putting their services in listening mode. If clients want to 
accept incoming data shares, but want to block incom- 
ing video calls, they must create different services. 
[0103] A client can create an exclusive service on a 
set of ports to prevent other clients from creating serv- 
ices on these ports. This is useful, for example, to pre- 
vent multiple conference bridges from mana^ng the 
same set of conference bridge ports. 
[01 04] Next to be considered is the preferred manner 
in which the AVNM 63 (Figure 21), in cooperation with 
the Service Server 69. CBM 64 and participating CMWs 
provide for n^naging AA/ Switching Circuitry 30 and 
conference bridges 35 in Figure 3 during 
audio/video/data teleconferencing. The participating 
CMWs may include workstations located at both bcal 
and remote sites. 

BASIC TWO-PARTY VIDEOCONFERENCING 

[0105] As previously described, a CMW includes a 
Collaboration Initiator software module 161, (see Rg. 
20) which is used to establish person-to-person and 
multiparty calls. The corresponding collaboration initia- 
tor window advantageously provides quick-dial face 
icons of frequently dialed persohs, as illustrated, for 
example, in Figure 22, which is an enlai'ged view of typ- 
ical face icons along with various initiating buttons ' 
(described in greater detail below in connection: with 
Figs. 35-42). " : ; ^ 

[0106] Videoconference calls can be initiated, for 
example, merely by double-clicking oh these icons. 
When a call is initiated, the CMW typically provides' a 
screen display that Includes a live video picture ^of the 
remote conference participant, as illustrated for iexam- 
ple in Figure 8A. In the preferred embodiment, this dis^ 
play also includes control buttons/menu items that can 
be used to place the remote participant on hold, to 
resume a call on hold, to add one or more participanrts 
to the call, to initiate data sharing and to hang up the 

call. . : . ' 

[01 07] The basic underlying software-controlled oper-^ 
ations occurring for a two-party call are diagrammati- 
cally illustrated in Figure 23. After logging to AVNM 63, 
as indicated by (1) in Figure 23. a caller initiates a call 
(e.g.. by selecting a user from the graphical rolodex and 
clicking the call button or by double-clicking the face 
icon of the callee on the quick-dial panel). The Caller's i 
Collaboration Initiator . responds by identifying ' the - 
selected user and requesting that user'^ address from 
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Directory Service 60, as indicated by (2) in Figure 23. 
Directory Service 66 looks up the callee's address in the 
directory database, as indicated by (3) in Figure 23, and 
then returns it to the caller's Collaboration Initiator, as 
illustrated by (4) in Figure 23. 5 
[0108] The caller's Collaboration Initiator sends a 
request to the AVNM to place a video call to the caller 
with the specified address, as indicated by (5) in Figure 
23. The AVNM queries the Service Server to find the 
service instance of type "video cair whose name corre- 10 
sponds to the callee's address. This service record 
identifies the location of the callee's Collatx)ration Initia- 
tor as well as the network ports that the callee is con- 
nected to. If no service instance is found for the callee, 
the AVNM notifies the caller that the callee is not logged 75 
in. If the callee is local, the AVNM sends a call event to 
the callee's Collaboration Initiator, as indicated by (6) in 
Figure 23. If the callee is at a remote site, the AVNM for- 
wards the call request (5) through the WAN gateway 40 
for transmission, via WAN 1 5 (Figure 1 ) to the Collabo- 20 
ration Initiator of the callee's CMW at the remote site. 
[01 09] The callee's Collat>oration Initiator can respond 
to the call event in a variety of ways. In the preferred 
embodiment, a user-selectable sound is generated to 
announce the incoming call. The Collaboration Initiator 25 
can then act in one of two modes. In "Telephone Mode," 
the Collaboration Initiator displays an invitation mes- 
sage on the CMW screen that contains the name of the 
caller and buttons to accept or refuse the call. The Col- 
laboration Initiator will then accept or refuse the call, 30 
depending on which button is pressed by the callee. In 
"Intercom Mode," the Collaboration Initiator accepts alt 
incoming calls automatically, unless there is already 
another call active on the callee's CMW. in which case 
behavior reverts to Telephone Mode. 35 
[01 1 0] The callee's Collaboration Initiator then notifies 
the AVNM as to whether the call will be accepted or 
refused. If the call is accepted. (7), the AVNM sets up 
the necessary communication paths between the caller 
and the callee required to establish the call. The AVNM 40 
then notifies the caller's Collaboration Initiator that the 
call has been established by sending it an accept event 
(8). If the caller and callee are at different sites, their 
AVNMs will coordinate in setting up the communication 
paths at both sites, as required by the call. 45 
[01 1 1 ] The AVNM may provide for managing connec- 
tions among CMWs and other multimedia resources for 
audio/video/data communications in various ways. The 
manner employed in the preferred embodiment will next 
be described. so 
[0112] As has been described previously, the AVNM 
manages the switches in the AN Switching Circuitry 30 
in Figure 3 to provide port-to-port connections in 
response to connection requests from clients. The pri- 
mary data structure used by the AVNM for managing 55 
these connections will be referred to as a callhandle, 
which is comprised of a plurality of bits, including state 
bits. 
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[01 13] Each port-to-port connection managed by the 
AVNM comprises two callhandles. one associated with 
each end of the connection. The callhandle at the client 
port of the connection permits the client to manage the 
client's end of the connection. The callhandle mode bits 
determine the current state of the callhandle and which 
of a port s four switch connections (video in video out, 
audio in. audio out) are involved in a call. 
[01 14] AVNM clients send call requests to the AVNM 
whenever they want to initiate a call. As part of a call 
request, the client specifies the local service in which 
the call will be involved, the name of the specific port to 
use for the call, identifying information as to the callee, 
and the call mode. In response the AVNM creates a call- 
fiandle on the caller's port. 

[0115] All callhandles are created in the "idle" state. 
The AVNM then puts the caller's callhandle in the 
"active" state. The AVNM next creates a callhandle for 
the callee and sends it a call event, which places the 
callee's callhandle in the "ringing" state. When the cal- 
lee accepts the call, its callhandle is placed in the 
"active" state, which results in a physical connection 
between the caller and the callee. Each port can have 
an arbitrary number of callhandles bound to tt. but typi- 
cally only one of these callhandles can be active at the 
same time. 

[01 1 6] After a call has been set tp. AVNM clients can 
send requests to the AVNM to change the state of the 
call, which can advantageously be accomplished by 
controlling the callhandle states. For example, during a 
call, a call request from another party could arrive. This 
an^ival could be signaled to the user by providing an 
alert indication in a dialog box on the user's CMW 
screen. The user could refuse the call by clicking on a 
refuse button in the dialog box. or by clicking on a "hold" 
button on the active call window to put the cun-ent call 
on hold and allow the incoming call to be accepted. 
[01 17] The placing of the currently active call on hold 
can advantageously be accomplished by changing the 
caller's callhandle from the active state to a "hold" state, 
which permits the caller to answer incoming calls or ini- 
tiate new calls, without releasing the previous call. Since 
the connection set-up to the callee will be retained, a 
call on hold can conveniently be resumed by the caller 
clicking on a resume button on the active call .window, 
which returns the conresponding callharidle baick to the 
active state. Typically, multiple calls can be put on hold 
in this manner. As an aid in managing calls that are on 
hold, the CMW advantageously provides a hold list dis- 
play, identifying these on-hold calls and (optionally) the 
length of time that each party is on hold. A conrespond- 
ing face icon could be used to identify each on-hold call. 
In addition, buttons could be provided in this hold dis- 
play which would allow the user to send a prepro- 
grammed message to a party on hold. For example, this 
message could advise the callee when tiie call will be 
resumed, or could state that the call is being terminated 
and will be reinitiated at a later time. 
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[0118] Reference is now directed to Figure 24 which 
diagrammatically illustrates how two-party calls are con- 
nected for CMWs WS-1 and WS-2, located at the same 
MLAN 10. As shown in Figure 24, CMWs WSl and WS- 
2 are coupled to the local AA/ Switching Circuitry 30 via 
ports 81 and 82, respectively. As previously described, 
when CMW WS-1 calls CMW WS-2, a callhandle is cre- 
ated for each port. If CMW WS-2 accepts the call, these 
two callhandles become active and in response thereto, 
the AVNM causes the AA/ Switching Circuitry 30 to set 
up the appropriate connections between ports 81 and 
82, as indicated by the dashed line 83. 
[01 1 9] Figure 25 diagrammatically illustrates how two- 
party calls are connected tor CMWs WS-1 and WS-2 
when located in different MLANs 10a and 10b. As illus- 
trated in Figure 25. CMW WS-1 of MLAN 10a is con- 
nected to a port 91a of AA/ Switching Circuitry 30a of 
MLAN 10a. while CMW WS-2 is connected to a port 91 b 
of the audioA^deo switching circuit 30b of MLAN 10b. tt 
will be assumed that MLANs 10a and 10b can commu- 
nicate with each other via ports 92a and 92b (through 
respective WAN gateways 40a and 40b and WAN 1 5). A 
call between CMWs WS-1 and WS-2 can then be estab- 
lished by AVNM of MLAN 10a in response to the crea- 
tion of callhandles at ports 91a and 92a. setting up 
appropriate connections behveen these ports as indi- 
cated by dashed Une 93a. and by AVNM of MLAN 10b. 
in response to callhandles created at ports 91b and 92b, 
setting up appropriate connectiorG between these ports 
as indicated by dashed line 93b. Appropriate paths 94a 
and 94b in WAN gateways 40a and 40b, respectively 
are set up by the WAN network manager 65 (Figure 21) 
in each network. , 

CONFERENCE CALLS 

[0120]> Next to be described is the specific manner in 
which the preferred entxxliment provides for multi-party 
conference calls (involving nnore than two participants). 
When a rnutti-pairty conference call is initiated, the CMW 
provides a screen that is similar to the screen for two- 
party calls, which displays a live video picture of the cal- 
lee's image in a video window. However, for multi-party 
calls, the screen Includes a video mosaic containing a 
live video picture of each of the conference participants 
(including the CMW user's own picture), as shown, for 
example, in Figure 8B. Of course, other embodiments 
could show only the remote conference participants 
(and not the local CMW user) in the conference mosaic 
(or show a mosaic containing both participants in a two- 
party call). In addition to the controls shown in Figure 
8B, the multi-party conference screen also includes but- 
tons/menu items that can be used to place individual 
conference participants on hold, to remove individual 
participants form the conference, to adjourn the entire 
conference, br to provide a "close-up" image of a single 
individual (in place . of fthe vkjeo mosaic). 
[0121} Multi-party conf fencing requires ail the mech- 



anisms employed for 2-party calls. In addition, it 
requires the conference bridge manager CBM 64 (Fig- 
ure 21) and the conference bridges 36 (Figure 3). The 
CBM acts as a client of the AVNM in managing the oper- 

5 ation of the conference bridges 36. The CBM also acts 
a server to other clients on the network. The CBM 
makes conferencing services available by creating serv- 
ice records of type "conference" in the AVNM service 
database and associating these services with the ports 

10 on AA/ Switching Circuitry 30 for connection to confer- 
ence bridges 36. 

[01 22] The preferred embodiment provides two ways 
for initiating a conference call. The first way is to add 
one or more parties to an existir^ two-party call. For this 

75 purpose, an ADD button is provided by both the Collab- 
oration Initiator and the Rolodex, as illustrated in Fig- 
ures 2A and 22. To add a new party, a user selects the 
party to be added (by clicking on the user*s rdodex 
name or face icon as desaibed above) and clicks on the 

20 ADD button to invite that new party. Additional parties 
can be invited in a similar manner. The second way to 
initiate a conference call is to select the parties in a sim- 
ilar manner and then click on the CALL txitton (also pro- 
vided in the Collaboration Initiator and Rolodex windows 

25 on the user's CMW saeen). 

[0123] Another alternative embodiment is to initiate a 
conference call from the beginning by clicking on a 
CONFERENCE/MOSAIC icon/button/menu item on the 
CMW screen. This coutd initiate a conference call with 

30 the call inKiator as the sole participant (i.e.. causing a 
conference bridge to be allocated such that the caller's 
image also appears on his/her own screen in a video 
nrx)Sdic. which will also include images of subsequently 
added participants). New participants could be invited. 

35 for example, by selecting each new party's face icon 
and then clicking on the ADD button. 
[01 24] Next to be considered with reference to Figures 
26 and 27 is the manner in which conference calls are 
handled in the pretended eobodiment. For the pulses of 

40 this description it will be assumed that up to four parties 
may participate in a conference call. Each conference 
uses four bridge ports 136-1. 136-2, 136-3 and 136-4 
provided on A/V Switching Circuitry 30a. which are 
respectively coupled to bidirectional audio/video lines 

45 36-1, 35-2, 36-3 and 36-4 connected to conference 
bridge 36. However, from this description it will be 
apparent how a conference call may be provided for 
additional parties, as well as simultaneously occurring 
conference calls: 

50 [0125] Once the Collaboration Initiator determines 
that a conference is to be initiated, it queries the AVNM 
for a conference service. If such a service is available, 
the Collaboration Initiator requests the associated CBM 
to allocate a conference bridge. The Collaboration Initi- 

55 ator then places an audioA^deo call to the CBM to initi- 
ate the conference. When the CBM accepts the call, the 
AVNM couples port 101 of CMW WS-1 to lines 36-1 of 
conference bridge 36 tsy a connection 137 produced in 
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response to callhandles created for port 101 of WS-1 
and bridge port 136-1. 

[01 26] When the user of WS- 1 selects the appropriate 
face icon and clicks the ADD button to invite a new par- 
ticipant to the conference, which will be assumed to be s 
CMW WS-3. the Collaboration Initiator on WS-1 sends 
an add request to the CBM. In response, the CBM calls 
WS-3 via WS-3 port 103. When CBM initiates the call, 
the AVNM creates callhandles for WS-3 port 103 and 
bridge port 136-2. When WS-3 accepts the call, its call- io 
handle is made "active." resulting in connection 138 
being provided to connect WS-3 arxJ lines 1 36-2 of con- 
ference bridge 35. Assuming CMW WS-1 next adds 
CMW WS-5 and then CMW WS-8, callhandles for their 
respective ports and bridge ports 136-3 and 136-4 are is 
created, in turn, as described above for WS-1 and WS- 
3, resulting in connections 139 and 140 being provided 
to connect WS-5 and WS-9 to conference bridge lines 
36-3 and 36-4, respectively. The conferees WS-1. WS- 
3, WS-5 arxl WS-8 are thus coupled to conference 20 
bridge lines 136-1. 136-2 136-3 and 136-4, respectively 
as shown in Figure 26. 

[01271 It will be understood that the video mosaicing 
circuitry 36 and audio mixing circuitry 38 irKX)rporated in 
conference bridge 36 operate as previously described. 25 
to form a resulting four-picture mosaic (Rgure 88) that 
is sent to all of the conference participants, which in this 
exanple are CMWs WS-1. WS-2. WS-5 and WS-8. 
Users may leave a conference by just hanging up, which 
causes the AVNM to delete the associated callhandles 30 
and to send a hangup notification to CBM. When CBM 
receives the notification, it notifies all other conference 
participants that the participant has exited. In the pre- 
ferred embodiment, this results in a blackened portion of 
that participant's video mosaic image being displayed 35 
on the screen of all remaining participants. 
[0128] : The manner in which the CBM and the confer- 
ence .bridge 36 operate when conference participants 
are located at different sites will be evident from the pre- 
viously de&crtbed operation of the cut-and-paste dr- 40 
cuitry 39 (Figure 10) with the video mosaicing circuitry 
36 (Figure 7) and audio mixing circuitry 38 (Figure 9), In 
such case, each incoming single video picture or 
mosaic from another site is connected to a respective 
one of the conference bridge lines 36- 1 to 36-4 via WAN 45 
gateway 40. 

[0129] The situation in which a two-party call is con- 
verted to a conference call will next be considered in 
connection with Figure 27 and the previously consid- 
ered 2-party call illustrated in Figure 24. Converting this so 
2-party call to a conference requires that this two-party 
call (such as illustrated between WS-1 and WS-2 in Fig- 
ure 24) be rerouted dynamically so as to be coupled 
through conference bridge 36. When the user of WS-1 
clicks on the ADD button to add a new party, (for exam- ss 
pie WS-5), the Collaboration Initiator of WS-1 sends a 
redirect request to the AVNM. which cooperates with 
the CBM to break the two-party connection 83 in Figure 



24, and then redirect the callhandles created for ports 
81 and 83 to callhandles created for bridge ports 136-1 
and 136-2, respectively. 

[01 30] As shown in Figure 27, this results in producing 
a connection 86 between WS-1 and bridge port 136-1 , 
and a connection 87 between WS-2 and bridge port 
136-2, thereby creating a conference set-up between 
WS-1 and WS-2. Additional conference participants can 
then be added as described above for the situations 
described above in which the conference is initiated by 
the user of WS-1 either selecting multiple participants 
initially or merely selecting a "conference" and then 
adding subsequent participants. 
[0131] Having described the preferred nnanner in 
which two-party calls and conference calls are set up in 
the preferred embodiment, the prefen-ed manner in 
which data conferencing is provided between CMWs 
will next be described. 

DATA CONFERENCING 

[0132] Data conferencing is implemented in the pre- 
ferred embodiment by certain Snapshot Sharing soft- 
ware provided at the CMW (see Rgure 20). This 
software permits a "snapshot" of a selected portion of a 
participant's CMW screen (such as a window) to be dis- 
played on the CMW screens of other selected partici- 
pants (whether or not those participants are also 
involved in a videoconference). Any number of snap- 
shots may be shared simultaneously. Once displayed, 
any participant can then telepoint on or annotate the 
snapshot, which animated actions and results will 
appear (virtually simultaneously) on the screens of all 
other participants. The annotation capabilities provided 
include lines of several different widths and text of sev- 
eral different sires. Also, to facilitate participant identifi- 
cation, these annotations may be provided in a different 
color for each participant. Any annotation may also be 
erased by any participant. Figure 2B (lower left window) 
illustrates a CMW saeen having a shared graph on 
which participants have drawn and typed to call atten- 
tion to or supplement specific portions of the shared 
image. 

[0133] A participant may initiate data conferencing 
with selected participants (seized and added as 
described above for videoconference calls) by clicking 
on a SHARE button on the scre^i (available in the Rolo- 
dex or Collaboration Initiator windows, shown in Figure 
2A, as are CALL and ADD buttons), followed by selec- 
tion of the window to be shared. When a participant 
clicks on his SHARE button, his Collaboration Initiator 
module 161 (Figure 20) queries the AVNM to locate the 
Collaboration Initiators of the selected participants, 
resulting in invocation of their respective Snapshot 
Sharing modules 164. The Snapshot Sharing software 
modules at the Cf^s of each of the selected partici- 
pants query their local operating system 180 to deter- 
mine availat»le graphic formats, and then send this 
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information to the initiating Snapshot Sharing module, 
which determines the format that will produce the most 
advantageous display quality and performance for each 
selected participant. 

[01 34] After the snapshot to be shared is displayed on s 
all CM Ws. each participant may telepoint on or annotate 
the snapshot, which actions and results are displayed 
on the CMW screens of all particpants. This is prefera- 
bly accomplished by monitoring the actions made at the 
CMW (e.g., by tracking mouse movements) and send- io 
ing these "operating system commands" to the CMWs 
of the other participants, rather than continuously 
exchanging bitmaps, as would be the case with tradi- 
tional "remote control" products. 

[0135] As illustrated in Figure 28, the original 75 
unchanged snapshot is stored in a first bitmap 210a. A 
second bitmap 210b stores the combination of the orig- 
inal snapshot and any annotations. Thus, when desired 
(e.g.. by clicking on a CLEAR button located in each 
participant's Share window, as illustrated in Figure 2B). 20 
the original unchanged snapshot can be restored (i.e., 
erasing all annotations) using bitmap 21 Oa. Selective 
erasures can be accomplished by copying Into (i.e., 
restoring) the desired erased area of bitmap 210b with 
the corresponding portion from bitmap 2lba. 25 
[01 36] Rather than causing a new Share window to be 
created whenever a snapshot is shared, rt is possible to 
replace the contents of an existing Share window with a 
new image. This can be achieved in either of two ways. 
First, the user can click on the GRAB button and then 30 
select a new window whose contents should replace the 
contents of the existing Share window. Second, the user 
can click on the REGRAB button to cause a (presuma- 
bly modified) version of the original source window to 
replace the contents of the existing Share window. This 3S 
is particularly useful when one participant desires to 
share a bng document that cannot be displayed on the 
screen in its entirety. For example,4hq .user- might xlis- 
play the first page of a spreadsheet on his saeen, use 
the SHARE button to share that page, discuss and per- 40 
haps annotate rt, then return to th^ spreadsheet appli- 
cation to positbn to the next page, use the REGRAM 
button to share the new page, and so on; This mecha- 
nism represents a simple, effective step toward applica- 
tion sharing. 45 
[0137] Further, instead of sharing a snapshot of data 
on his current screen, a user may instead choose to 
share a snapshot that had previously been saved as a 
file. This is achieved via the LOAD bM(fn, which causes 
a dialog box to appear, prompting the user to select a so 
file. Conversely, via the SAVEbutton, any snapshot may 
be saved, wrtii all cunrent annotations. 
[0138] The capabilities desaibed above were care- 
fully selected to be particularly effective in eriyironments 
where tiie principal goal is to share existing information, 55 
rather than to create new information. In particular, user 
interfaces are designed to make snapshot capture, tele- 
pointing and annotation extremely easy to use. Never- 
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theless. it is also to be understood that, instead of 
sharing snapshots, a blank "whiteboard" can also be 
shared (via the WHITEBOARD button provided by the 
Rolodex. Collaboration Initiator, and active call win- 
dows), and tiiat more complex paintbox capabilities 
coukJ easily be added for application areas that require 
such capabilities. 

[01 39] As pointed out previously herein, important fea- 
tures of the present invention reside in the manner in 
which tiie capabilities and advantages of multimedia 
mail (MMM), multimedia conference recording (MMCR), 
and multimedia document management (MMDM) are 
tightly integrated with audio/video/data teleconferencing 
to provide a multimedia collaboration system that facili- 
tates an unusually higher level of communicatbn and 
collatx)ration between geographically dispersed users 
than has heretofore been achievable by known prior art 
systems. Figure 29 is a schematic and diagrammatic 
view illustrating how nxjltimedia calls/conferences, 
MMCR, MMM and MMDM work together to provide the 
atx)ve-described features. In the prefened embodiment, 
MM Editing Utilities shown supplementing MMM and 
MMDM may be identical. 

[0140] Having already desaibed various embodi- 
ments and examples of audio/videc/data teleconferenc- 
ing, next to be conskjered are various ways of 
integrating MMCR, MMM and MMDM with 
audio/video/data teleconferencing in accordance with 
the invention. For this purpose, basic preferred 
approaches and features of each will be considered 
along with prefened associated hardware and software. 

MULTIMEDIA DOCUMENTS 

[0141] In one embodiment; tiie creation, storage, 
retrieval and editing of multimedia documents serve as 
the fc>asic element common to MMCR, MMM and 
MMDM. Accordingly, the preferred embodiment advan- 
tageously provides a universal format for multimedia 
documents. This format defines multimecfia documents 
as a collection of individual components in multiple 
media combined with an overall structure and timing 
componerit ttiat captures the identities, detailed 
dependencies, references to, and relationships among 
the various other components. The information pro- 
vided by tills structuring conrponent forms tiie basis for 
spatial layout, order of presentation, hyperlinks, tempo- 
ral synchronization, etc.. with respect to the composition 
of a multimedia document. Figure 30 shows the struc- 
ture of such documents as well as their relationship with 
editing and storage facilities. 
[01 42] Each of the components of a multimedia docu- 
ment uses its own editors for creating, editing, and view- 
ing. In addition, each component may use dedicated 
storage facilities. In the preferred embodiment, multime- 
dia documents are advantageously structured for 
authoring, storage, playback and editing by storing 
some data under conventional file systems and some 
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data in special-purpose storage servers as will be dis- 
cussed later The Conventional File System 504 can be 
used to store all non-time-sensitive portions of a multi- 
media document. In particular, the following are exam- 
ples of non-time-sensrtive data that can be stored in a 5 
conventional type of computer file system: 

1 . structured and unstructured text 

2. raster images 

3. structured graphics and vector graphics (e.g., 10 
PostScript) 

4. references to files in other file systems (video, hi- 
fidelity audio, etc.) via pointers 

5. restricted forms of executables 

6. structure and timing information for all of the is 
abcMB (spatial layout, order of presentation, hyper- 
links, temporal synchronization, etc.) 

[0143] Of particular importance in multimedia docu- 
ments is support for time-sensitive media and media 20 
that have synchronization requirements with other 
media conponents. Some of these time-sensitive 
media can be stored on conventional file systems while 
others may require special-purpose storage facilities. 
[0144] Examples of time-sensitive media that can be 25 
stored on conventional file systems are small audio files 
and short-or tow-quality video dips (e.g. as might be 
produced using QuickTime or Video for Windows). 
Other examples include window event lists as sup- 
ported by the Window-Event Record and Play system 30 
512 shown in Figure 30. This conponent allows for stor- 
ing and replaying a user's interactions with application 
programs by capturing the guests and events 
exchanged between the client program and the window 
system in a time-stamped sequence. After this "record" 35 
phase, the resulting information is stored in a conven- 
tional file that can later be retrieved and "played" back. 
During playback the same sequence of window system 
requests and events reoccurs with the same relative 
timing as when they were recorded. In prior-art sys- 40 
terns, this capability has been used for creating auto- 
mated demonstrations. In the present invention it can be 
used, for example, to reproduce annotated snapshots 
as they occurred at recording 

[0145] As described above in connection witii collab- 45 
oratfve workstation software, Snapshot Share 518 
shown in Rgure 30 Is a utility used in multimedia calls 
and conferencing for capturing window or screen snap- 
shots, sharing with one or more call or conference par- 
ticipants, and pennitting grotp annotation, telepointing, so 
and re-grabs. Here, this utility is adapted so that its cap- 
tured images and window events can be recorded by 
the Window-Event Record and Play system 512 while 
being used by only one person. By synchronizing 
events associated with a video or audio stream to spe- ss 
cif ic frame numbers or time codes, a multimedia call or 
conference can be recorded and reproduced in its 
entirety. Similarly, the same functionality is preferably 
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used to create multimedia mail whose authoring steps 
are virtually identical to participating in a multimedia call 
or conference (though otiier forms of MMM are not pre- 
cluded). 

[0146] Some time-sensitive media require dedicated 
storage servers in order to satisfy real-time require- 
ments. High-quality audio/video segments, for example, 
require dedicated real-time audio/video storage serv- 
ers. A pretended embodiment of such a server will be 
described later. N&ct to be considered is how the cur- 
rent invention guarantees synchronization between dif- 
ferent media components. 

MEDIA SYNCHRONIZATION 

[0147] A preferred manner for providing multimedia 
synchronization in the preferred embodiment will next 
be considered. Only multimedia documents with real- 
time material need include synchronization functions 
and information. Synchronization for such sitijations 
may be provided as described below. 
[0148] Audio or video segments can exist without 
being accompanied by the other. If audio and video are 
recorded simultaneously ("co-recorded"), the preferred 
errtodiment allows the case where their streams are 
recorded and played back with automatic synchroniza- 
tion — as wouki result from conventional VCRs, laserd- 
isks. or timenjivision nujltiplexed ("interleaved") 
audio/video streams. This excludes the need to tightiy 
synchronize (i.e.. "lip-sync") separate audio and vkJeo 
sequences. Rather, reliance is on ttie co-recording 
capability of the Real-Time AudioA/ideo Storage Server 
502 to deliver all closely synchronized audio and video 
directiy at its signal outputs. 

[0149] Each recorded video sequence is tagged witii 
time codes (e.g. SMPTE at 1/30 second intervals) or 
video frame numbers. Each recorded audio sequence is 
tagged with time codes (e.g., SMPTE or MIDI) or, if co- 
recorded with video, video frame numbers. 
[01 50] The preferred emtKXJiment also provides syn- 
chronization between window events and audio and/or 
video streams. The following functions are supported: 

1. Media-time-driven Synchronization : synchroni- 
zation of window events to an audio, video, or 
audio/video stream, using the real-time media as 
tfie timing source. 

2. Machine-time-driven-Svnchronizatton : 

a. synchronization of window events to the sys- 
tem clock 

b. synchronization of the start of an audio, 
video, or audioArtdeo segment to the system 
clock 

[01 51 ] If no audio or video is involved, machine-time- 
driven synchronization is used throughout the docu- 
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ment. Whenever audio and/or video is playing, media- 
time-synchronization is used. The system supports 
transition t>etween machine-time and media-time syn- 
chronization whenever an audio/video segment is 
started or stopped. 5 
[0152] As an example, viewing a multimedia docu- 
ment might proceed as follows: 

Document starts with an annotated share 
(machine-time-driven synchronization). io 
Next, start audio only (a 'Voice annotation") as text 
and graphical annotations on the share continue 
(audio is timing source for window events). 
Audio ends, txjl annotations continue (machine- 
time-driven synchronization). is 
Next, start co-recorded audio/video continuing with 
further annotations on same share (audio is timing 
source for window events). 
Next, start a new share during the continuing 
audio/video recording: annotations happen on both 2o 
shares (aixJio is timing source for window events). 
Audio/video stops, annotations on tx>th shares con- 
tinue (nr>achine-time-driven synchronization). 
Document ends. 

25 

AUDIO/VIDEO STORAGE 

[01 53] As described above, the present invention can 
include many special-purpose senders that provide stor- 
age of time-sensitive media (e.g. audio/video streams) so 
and support coordination with other media. This section 
describes the prefen-ed embodiment for audio/video 
storage and recording sen^ices. 
[01 54] Although storage and recording services could 
be provided at each CMW, it is preferable to employ a 35 
centralized server 502 coupled to MLAN 10, as illus- 
trated In Figure 31 . A centralized server 502. as shown 
in Figure 31 , provides the following advantages: . 

1. The total amount of storage hardware required 40 
can be far less (due to better utilization resulting 
from statistical averaging). 

2. Bulky and expensive conrpression/decompres- 
sion hardware can be pooled on the storage serv- 
ers and shared by multqDie clients. As a result fewer 45 
compression/decompression engines of higher per- 
formance are required than if each workstation 
were equipped with its own compression/decom- 
pression hardware. 

3. Also, more costly centralized codecs can be used so 
to transfer mail wide area among campuses at far 
lower costs that attempting to use data WAN tech- 
nologies. 

4. File system administration (e.g. backups and file 
system replication, etc.) are far less costly and ss 
higher performance. 

[0155] The Real-Time AudioA^ideo Storage Server 
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502 shown in Figure 31 A structures and manages the 
audio/video files recorded and stored on its storage 
devices. Storage devices may typically include compu- 
ter-controlled VCRs, as well as rewritable magnetic or 
optical disks. For example, server 502 in Figure 31 A 
includes disks 60e for recording and playback. Analog 
information is transferred between disks 60e and tiie 
AA^ Switching Circuitry 30 via analog I/O 62. Control is 
provided by control 64 coupled to Data LAN hub 25. 
[01 56] At a high level , the centralized audio/video stor- 
age and playback server 502 in Figure 31 A performs tiie 
following functions: 

File Management: 

It provides mechanisms for aeating, naming, 
time-stamping, storing, retrieving, copying, 
deleting, and playing back some or all portions 
of an audio/video file. 

File Transfer and Replication 

The audio/video file server supports replication 
of files on different disks managed by the same 
file server to facilitate simultaneous access to 
the same files. Moreover, file transfer facilities 
are provided to support transmission of 
audio/video files between itself and other 
audio/video storage and playback engines. File 
transfer can also be achieved by using tine 
underlying audio/video network facilities: serv- 
ers establish a real-time audio/video network 
connection between themselves so one server 
can "play back" a file while the second server 
simultaneously records it. 

Disk Management ■ i:. = ; 

The storage facjlitieis support specific disk allo- 
cation, gart^age collection and defragmentation 
fadlrties: They also support mapping disks with 
other disks (for replication and staging modes, 
as appropriate) and mapping disks, via I/O 
equipment, witii tiie appropriate Video/Audio 
network port. 

Synchronization support 

Synchronization between audio and video is 
ensured by tiie multiplexing scheme used by 
the storage media, typically by interleaving tiie 
audio and video streams in a time-division-mul- 
tiplexed fashion. Further, if synchronization is 
required with other stored media (such as win- 
dow system graphics), tiien frame numbers, 
time Codes, or other timing events are gener- 
ated by the storaigq server. An advantageous 
way of providing tills synchronization in the pre- 
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ferred embodiment is to synchronize record 
and playback to received frame number or time 
code events. 

Searching 

To support intra-file searching, at least start, 
stop, pause, fast fonArard, reverse, and fast 
reverse operations are provided. To support 
inter-file searching, audio/video tagging, or 
more generalized "go-to" operations and mech- 
anisms, such as frame numbers or time code, 
are supported at a search-function level. 

Connection Management 

The server handles requests for audio/video 
network connections from client progran^ 
(such as video viewers and editors running on 
client workstations) for real-time recording and 
real-time playback of audio/video files. 

[0157] Next to be considered is how centralized 
audio/video storage servers provide for real-time 
recording and playback of video streams. 

Real-Time Disk Delivery 

[01 58] To support real-time audio/video recording and 
playbacK the storage server needs to provide a real- 
time transmission path between the storage medium 
and the appropriate audio/video network port for each 
simultaneous client accessing the server. For example, 
if one user is viewing a video file at the same time sev- 
eral other people are creating and storing new video 
files on the same disk, multiple simultaneous paths to 
the storage media are required. Similarly, video mail 
sent to large distribution groups, video databases, and 
similar functions may also require simultaneous access 
to the same video files, again inrposing multiple access 
requirements on the video storage capat}ilities. 
[0159] For storage servers that are t^sed on compu- 
ter-controlled VCRs or rewritatrfe laserdisks, a real-time 
transmission path is readily available through the direct 
analog connection between the disk or tape and the net- 
work port. However, because of this single direct con- 
nection, each VCR or laserdisk can only be accessed 
by one client program at the same time (multi-head 
laserdisks are an exception). Therefore, storage servers 
based on VCRs and laserdisks are difficult to scale for 
multiple access usage. In the preferred embodiment, 
multiple access to the same material is provided by file 
replication and staging, which greatly increases storage 
requirements and the need for moving information 
quickly among storage media units serving different 
users. 

[0160] Video systems based on magnetic disks are 
more readily scalable for simultaneous use by multiple 



people. A generalized hardware implementation of such 
a scalable storage and playtmck system 502 is illus- 
trated in Figure 32. Individual I/O cards 530 supporting 
digital and analog I/O are linked by intra-chassis digital 

5 networking (e.g. buses) for file transfer within chassis 
532 holding some number of these cards. Multiple chas- 
sis 532 are linked by inter-chassis networking. The Dig- 
ital Video Storage System available from Parallax 
Graphics is an example of such a system inplementa- 

10 tion. 

[01 61 ] The bandwidth available for the transfer of files 
among disks is ultimately limited by the bandwidth of 
these intra-chassis and inter-chassis networking. For 
systems that use sufficiently powerful video compres- 

75 ston schemes, real-time delivery requirements for a 
small number of users can be met by existing file sys- 
tem software (such as the Unix file system), provided 
that the block-size of the storage system is optimized for 
video storage and that sufficient buffering is provided by 

20 the operating system software to guarantee continuous 
flow of the audio/video data. 

[0162] Special-purpose software/hardware solutions 
can be provided to guarantee higher performance under 
heavier usage or higher bandwidth conditions. For 

25 exarrple, a higher throughput version of Figure 32 is 
illustrated in Figure 33. which uses aosspoint switch- 
ing, such as provided by SCSI Crossbar 540. which 
increases the total bandwidth of the inter-chassis and 
intra-chassis network, thereby increasing the number of 

30 possible simultaneous file transfers: 

Real-Time f^twork Delivery 

[01 63] By using the same audio/video format as used 
35 for audio/video teleconferencing, the audio/video stor- 
age system can leverage the previously described net- 
work facilities: the MLANs 10 can be used to establish a 
multimedia network connection between client worksta- 
tions and the audio/video storage servers. Audio/Video 
40 editors and viewers running on the client worl^ation 
use the same software interfaces as the multimedia tel- 
econferencing system to establish these network con- 
nections. 

[0164] The resulting architecture is shown in Figure 
45 31 B. Qient workstations use the existing audio/video 
network to connect to the storage server's network 
ports. These network ports are connected to compres- 
sion/decompression engines that plug into the server 
bus. These engines compress the audio/video streams 
so that come in over the network and store them on the 
local disk. Similarly, for playback, the server reads 
stored video segments from its local disk and routes 
them through the deconripression engines back to client 
workstations for local display. 
55 [0165] The present invention allows for alternative 
delivery strategies. For example, some compression 
algorithms are asymmetric, meaning that decompres- 
sion requires much less compute power than compres- 
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sion. In some cases, real-time decompression can even 
be done in software, without requiring any special-pur- 
pose decompression hardware. As a result, there is no 
need to decompress stored audio and video on the stor- 
age server and play it back in realtime over the network 5 
Instead, it can be twre efficient to transfer an entire 
audio/video file from the storage server to the client 
workstation, cache it on the workstation's disk, and play 
it back locally. These observations lead to a modified 
architecture as presented in Figure 31C. In this archi- 70 
tecture, clients interact with the storage server as fol- 
lows: 

To record video, clients set up real-time audio/video 
network connections to the storage server as 75 
before (this connection could make use of an ana- 
log line). 

In response to a connection request, the storage 
server allocates a compression nxxJule to the new 
client. 20 
As soon as the dient starts recording, the storage 
server routes the output from the compression 
hardware to an audio/video file allocated on its local 
storage devices. 

For playback, this audio/video file gets transfen'ed 25 
over the data network to the client workstation and 
pre-staged on the workstation's local disk 
The client uses local decompression software 
and/or hardware to play back the audio/video on its 
k)cal audio and video hardware. 30 

[0166] This approach frees up audio/video network 
ports and compression/decompression engines on the 
server. As a result, the server is scaled to support a 
higher number of simultaneous recording sessions, 35 
thereby further reducing the cost of the system. Note 
that such an architecture can be part of a preferred 
embodiment for reasons other than compres- 
sion/decompression asymmetry (such as the econom- 
ics of the technology of the day, existing embedded 40 
base in the enterprise, etc.). 

MULTIMEDIA CONFERENCE RECORDING 

[0167] Multimedia conference recording (MMGR) will 45 
next be considered. For full-feature multimedia desktop 
calls and conferencing (e.g. audio/video calls or confer- 
ences with snapshot share), recording (storage) capa- 
bilities are preferably provided for audio and video of all 
parties, and also for all shared windows, including any so 
telepointing and annotations provided during the tel- 
econference. Using the multimedia synchronization 
facilities described above, these capabilities are pro- 
vided in a way such that they can be replayed with accu- 
rate correspondence in time to the recorded audio and ss 
video, such as by synchronizing to frame numbers or 
time code events. 

[0168] A preferred way of capturing audio and video 



from calls would be to record all calls and conferences 
as if they were multi-party conferences (even for two- 
party calls), using video mosaicing, audio mixing and 
cut-and-pasting, as previously described in connection 
with Figures 7-11. It will be appreciated that MMGR as 
desaibed will advantageously permit users at their 
desktop to review real-time collaboration as it previously 
occun^ed, including during a later teleconference. The 
output of a MMGR session is a multimedia document 
that can be stored, viewed, and edited using the multi- 
media document facilities described earlier. 
[0169] Figure 31 D shows how conference recording 
relates to the various system components described 
earlier. The Multimedia Conference Record/Play sys- 
tem 522 provides the user with the additional GUIs 
(graphical user interfaces) and other functions required 
to provide the previously described MMGR functionality. 
[0170] The Conference Invoker 518 shown in Figure 
31 D is a utility that coordinates the audio/video calls that 
must be made to connect the audio/video storage 
server 502 with special recording outputs on conference 
bridge hardware (35 in Figure 3). The resulting record- 
ing is linked to information identifying the conference, a 
function also peilormed by this utility. 

MULTIMEDIA MAIL 

[0171] Now considering multimedia mail (MMM). it will 
be understood that MMM adds to the above-described 
MMGR the capability of delivering delayed collabora- 
tion, as well as the additional ability to review the infor- 
mation multiple times and, as described hereinafter, to 
edit, re-send, and archive it. The captured information is 
preferably a superset of thai captured during MMGR, 
except that no other user is involved and the user is 
given a chance to review and edit before sending the 
message. 

[0172] The Multimedia Mail system 524 in Figure 31 D . 
provides the user with the additional GUIs and other 
functions required to provide the previously described 
MMM functionality. Multimedia Mail relies on a conven- 
tional Email system 506 shovm in Figure 31 D for creat- 
ing, transporting, and browsing messages. However, 
multimedia document editors and viewers are used for 
aeating and viewing message bodies. Multimedia doc- 
uments (as described above) consist of time-insensitive 
components and time-sensitive components. The Con- 
ventional Email system 506 relies on the Conventional 
File system 504 and Real-Time AudioA/ideo Storage 
Sen/er 502 for storage support. The time-insensitive 
components are transported within the Conventional 
Email system 506, while the real-time components may 
be separately transported through the audio/video net- 
work using file transfer utilities associated with the Real- 
Time AudioA/ideo Storage Server 502. 
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[0173] Multimedia document management (MMDM) 
provides long-term, high-volume storage for MMCR and 
MMM. The MMDM system assists in providing the fol- 5 
lowing capabilities to a CMW user: 

1. Multimedia documents can be authored as mail 
in the MMM system or as call/conference record- 
ings in the MMCR system and then passed on to io 
the MMDM system. 

2. To the degree supported by external compatible 
multimedia editing and authoring systems, multime- 
dia documents can also be authored by means 
other than MMM and MMCR. is 

3. Multimedia documents stored within the MMDM 
system can be reviewed and searched. 

4. Multimedia documents stored within the MMDM 
system can be used as material in the creation of 
subsequent MMM. 20 

5. Multimedia documents stored within the MMDM 
system can be edited to create other multimedia 
documents. 

[0174] The Multimedia Document Management sys- 25 
tem 526 in Figure 31 D provides the user with the addi- 
tional GUIs and other functions required to provide the 
previously described MMDM functionality. The MMDM 
includes sophisticated searching and editing capabili- 
ties in connection with the MMDM multimedia document so 
such that a user can rapidly access desired selected 
portions of a stored multimedia document. The Special- 
ized Search system 520 in Figure 30 comprises utilities 
that allow users to do more sophisticated searches 
across and within multimedia documents. This includes 35 
context-based and content-based searches (employing 
operations much as speech and image recognition, 
information filters, etc.), time-based searches, and 
event-based searches (window events, call manage- 
ment events, speech/audio events, etc.). 40 

GLASSES OF COLLABORATION 

[0175] The resulting multimedia collaboration environ- 
ment achieved by tiie above-described integration of 45 
audio/video/data teleconferencing. MMCR. MMM and 
MMDM is illustrated in Rgure 34. It will be evident that 
each user can collaborate with other users in real-time 
despite separations in space and time. In addition, col- 
laborating users can access information already availa- so 
ble within their computing and information systems, 
including information captured from previous collabora- 
tions. Note In Figure 34 that space and time separations 
are supported in the following ways: 

55 

1. Same time, different place 

Multimedia calls and conferences 



MMDM access to stored MMCR and MMM 
information, or use of 

MMM directly (i.e., copying mall to oneself) 

3. Different time, different place 

MMM 

4. Same time, same place 

Collaborative, face-to-face, multimedia docu- 
ment creation 

[0176] By use of the same user interfaces a network 
functions, the present invention smoothly spans these 
three venus. 

REMOTE ACCESS TO EXPERTISE 

[0177] In order to illustrate how the present Invention 
may be implemented and operated, an exemplary pre- 
ferred embodiment will be described having features 
applicable to the aforementioned scenario involving 
remote access to expertise. It is to be understood that 
this exemplary embodiment is merely illustrative, and is 
not to be considered as limiting the scope of the Inven- 
tion, since the invention may be adapted for other appli- 
cations (such as in engineering and manufacturing) or 
uses having more or less hardware, software and oper- 
ating features and combined in various ways. 
[0178] Consider tiie following scenario involving 
access from remote sites to an in-house corporate 
"expert" in the trading of financial instruments such as in 
the securities market: 

[0179] The focus of tiie scenario revolves around the 
activities of a trader wtio is a specialist in securities. The 
setting is the start of his day at his desk in a major finan- 
cial center (NYC) at a major U.S. investment bank. 
[0180] The Expert has been actively watching a par- 
ticular security over the past week and upon his arrival 
into the office, he notices it is on the rise. Before going 
home last night, he previously set up his system to filter 
overnight news on a particular family of securities and a 
security within that family. He scans tiie flKered news 
and sees a story that may have a long-temri Impact on 
this security In question. He believes he needs to act 
now in order to get a good price on the security. Also, 
through filtered mail, he sees that his counterpart in 
London, who has also been watching this security, ts 
interested in getting our Expert's opinion once he 
arrives at work. 

[0181] The Expert Issues a multimedia mail message 
on the security to the head of sales worldwide for use in 
working witii tiieir client base. Also among tiie recipients 
is an analyst in the research department and his coun- 
terpart in London. The Expert, In preparation for his pre- 
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viously established "on-call" office hours, consults with 
others within the corporation (using the videoconferenc- 
ing and other collaborative techniques described 
above), accesses company records from his CMW, and 
analyzes such information, employing software- 5 
assisted analytic techniques. His office hours are now at 
hand so he enters "intercom" mode, which enables 
incoming calls to appear automatically (without requir- 
ing the Expert to "answer his phone" and elect to accept 
or reject the call). 10 
[0182] The Expert's computer beeps, indicating an 
incoming call, and the image of a field representative 
201 and his client 202 who are located at a bank branch 
somewhere in the U.S. appears in video window 203 of 
the Expert's screen (shown in Fig, 35). Note that, unless 75 
the call is converted to a "conference" call (whether 
explicitly via a menu selection or implicitly by calling two 
or more other partidparrts or adding a third participant 
to a call), the callers will see only each other in the video 
window and will not see themselves as part of a video 20 
mosaic. 

[0183] Also illustrated on the Expert's screen in Fig. 
35 is the Collaboration Initiator window 204 from which 
the Expert can (utilizing Collaboration Initiator software 
module 161 shown in Rg. 20) initiate and control vari- 25 
ous collaborative sessions. For example, the user can 
initiate v^th a selected'participant a video call (CALL 
button) or the addition of that selected participant to an 
existing video call (ADD button), as well as a share ses- 
sion (SHARE button) using a selected window or region 30 
on the screen (or a blank region via the WHITEBOARD 
button for subsequent annotation). The user can also 
invoke his MAIL software (MAIL button) and prepare 
outgoing or check incoming Email messages (the pres- 
ence of which is indicated by a picture of an envelope in 35 
the dog's mouth in In Box icon 205), as well as check for 
**! called" messages from other callers (MESSAGES 
button) left via the LEAVE WORD button in video win- 
dow 203. Video virindow 203 also contains'buttons from 
which many of these and certain adcfitional features can 40 
be Invoked, such as hanging up a video .call (HANGUP 
button), putting a call on hold (HOLD button), resuming 
a call previously put on hold (RESUME txittdn) or mut- 
ing the audio portion of a call (MUTE button). In addi- 
tion, the user can invoke the recording of a conference 45 
by the conference RECORD button. Also present on the 
Expert's screen is a standard desktop window 206 con- 
taining icons from v^ich other programs (whether oi" not 
part of this invention) can be launched. . / 
[0184] Returning to the example, the Expert is now so 
engaged in a videoconference with field representative 
201 and his client 202. In the course of this videoconfer- 
ence. as illustrated In Fig. 36. the field representative 
shares with the Expert a graphical image 21 Q (pie chart 
of client portfolio holdings) of his client's portfolio hold- ss 
ings (by clicking on his SHARE button, corresponding to 
the SHARE button in^ video window .203 of the Expert's 
screen, and selecting that image from his saeeh. result- 



ing in the shared image appearing in the Share window 
211 of the screen of all participants to the share) and 
begins to discuss the client's investment dilemma. The 
field representative also invokes a command to seaetly 
bring up the client profile on the Expert's screen. 
[0185] After considering this information, reviewing 
the shared portfolio and asking clarifying questions, the 
Expert illustrates his advice by creating (using his own 
modeling software) and sharing a new graphical image 
220 (Fig. 37) with the field representative and his client. 
Either party to the share can annotate that image using 
the drawing tools 221 (and the TEXT button, which per- 
mits typed characters to be displayed) provided within 
Share window 21 1 , or "regrab" a modified version of the 
original image (by using the REGRAB button), or 
rennove all such annotations (by using the CLEAR but- 
ton of Share window 211). or "grab" a new image to 
share (by clicking on the GRAB button of Share window 
211 and selecting that new image from the screen). In 
addition, any participant to a shared session can add a 
new participant by selecting that participant from the 
rolodex or quick-dial list (as described above for video 
calls and for data conferencing) and clicking the ADD 
button of Share window 211. One can also save the 
shared image (SAVE button), load a previously saved 
image to be shared (LOAD button), or print an image 
(PRINT button). 

[01 86] While discussing the Expert's advice, field rep- 
resentative 201 makes annotations 222 to image 220 In 
order to illustrate his concerns. While responding to the 
concerns of field representative 201. the Expert hears a 
beep and receives a visual notice (New Call window 
223) on his screen (not visible to the field representative 
and his client), indicating the existence of a new incom- 
ing call and identifying the caller. At this point, the 
Expert can accept the new call (ACCEPT button), 
refuse the new call (REFUSE button, which will resuft In 
a message being displayed on the caller's saeen Indi- 
cating that the Expert is unavailable) or add the new 
caller to the Expert's existing call (ADD button). In this 
case, the Expert elects yet another option (not shown) - 
to defer the call and leave the caller a standard mes- 
sage that the Expert will call back in X minutes (in this 
case, 1 minute). The Expert then elects also to defer his 
existing call, telling the f lekJ representative and his client 
that he will call them back in 5 minutes, and then elects 
to return the initial deferred call. 
[01 87] It should be noted that the Expert's act of defer- 
ring a call results not only in a message being sent to 
the caller, but also in the caller's name (arvJ perhaps 
other information associated with the call, such as the 
time the call was deferred or is to be resumed) being 
displayed in a list 230 (see Fig. 38) on the Expert's 
screen from which the call can be reinitiated. Moreover, 
the "state" of the call (e.g.. the information being 
shared) is retained so that it can be recreated when the 
call is reinitiated. Unlike a "hold" (described above), 
deferring a call actually breaks the logical and physical 
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connections, requiring that the entire call be reinitiated 
by the Collaboration Initiator and the AVNM as 
described above. 

[0188] Upon returning to the initial deferred call, the 
Expert engages in a videoconference with caller 231 . a s 
research analyst who is located 10 floors up from the 
Expert with a complex question regarding a particular 
security. Caller 231 decides to add London expert 232 
to the videoconference (via the ADD button in Collabo- 
ration Initiator window 204) to provide additional infor- io 
mation regarding the factual history of the security. 
Upon selecting the ADD button, video window 203 now 
displays, as illustrated in Fig. 38. a video mosaic con- 
sisting of three smaller images (instead of a single large 
image displaying only caller 231) of the Expert 233, is 
caller 231 and London expert 232. 
[0189] During this videoconference, an urgent PRI- 
ORITY request (New Call window 234) is received from 
the Expert's boss (who is engaged in a three-party vid- 
eoconference call with two memt^ers of the bank's oper- 20 
ations department and is attempting to add the Expert 
to that call to answer a quick question). The Expert puts 
his three-party videoconference on hold (merely by 
clicking the HOLD button in video window 203) and 
accepts (via the ACCEPT button of New Call window 25 
234) the urgent call from his boss, which results in the 
Expert being added to the boss* three-party videocon- 
ference call. 

[0190] As illustrated in Fig. 39. video window 203 is 
now replaced with a four-person video mosaic repre- so 
senting a four-party conference call consisting of the 
Expert 233, his boss 241 and the two members 242 and 
243 of the bank's operations department. The Expert 
quickly answers the boss' question and, by clicking on 
the RESUME button (of video window 203) adjacent to 35 
the names of the other participants to the call on hold, 
simultaneously hangs up on the conference call with his 
boss and resumes his three-party conference call 
involving the securities issue, as illustrated In video win- 
dow 203 of Fig. 40. 40 
[0191] While that call was on hold, however, analyst 
231 and London expert 232 were still engaged in a two- 
way videoconference (with a blackened portion of the 
video mosaic on their screens indicating that the Expert 
was on hold) and had shared and annotated a graphical 4s 
image 250 (see annotations 251 to image 250 of Fig. 
40) illustrating certain financial concerns. Once the 
Expert resumed the call, analyst 231 added the Expert 
to the share session, causing Share window 211 con- 
taining annotated image 250 to appear on the Expert's so 
screen. Optionally, snapshot sharing could progress 
while the video was on hokj. 

[01 92] Before concluding his conference regarding the 
securities, the Expert receives notification of an incom- 
ing multimedia mail message - e.g.. a beep accompa- ss 
nied by the appearance of an envelope 252 in the dog's 
mouth in In Box icon 205 shown In Rg. 40. Once he 
concludes his call, he quickly scans his incoming multi- 



media mail message by clicking on In Box icon 205. 
which invokes his mail software, and then selecting the 
incoming message for a quick scan, as generally illus- 
trated in the top two windows of Fig. 2B. He decides it 
can wait for further review as the sender is an analyst 
other than the one helping on his security question. 
[0193] He then reinitiates (by selecting deferred call 
indicator 230, shown in Rg. 40) his defen-ed call with 
f ield representative 201 and his client 202, as shown in 
Fig. 41 . Note that the full state of the call is also recre- 
ated, including restoration of previously shared image 
220 with annotations 222 as they existed when the call 
was deferred (see Fig. 37). Note also in Fig. 41 that, 
having reviewed his only unread incoming multimedia 
mail message, In Box icon 205 no longer shows an 
envelope in the dog's mouth, indicating that the Expert 
currently has no unread incoming messages. 
[0194] As the Expert continues to provide advice and 
pricing information to fiekl representative 201, he 
receives notifrcation of three priority calls 261-263 in 
short succession. Call 261 is the Head of Sales for the 
Chicago office. Working at home, she had instruced her 
CMW to alert her of all urgent news or messages, and 
was subsequently alerted to the arrival of the Expert's 
earlier multimedia mail message. Call 262 is an urgent 
international call. Call 263 is from the Head of Sales in 
Los Angeles. The Expert qurcWy winds down and then 
concludes his call with field representative 201 . 
[0195] The Expert notes from call indicator 262 that 
this call is not only an international call (shown in the top 
portion of the New Call window), but he realizes it is 
from a laptop user in the field in Central Mexico. The 
Expert elects to prioritize his calls in the following man- 
ner: 262. 261 and 263. He therefore quickly answers 
call 261 (by clicking on its ACCEPT button) and puts 
that call on hold while deferring call 263 in the manner 
discussed above. He then proceeds to accept the call 
identified by international call indicator 262. 
[0196] Note in Fig. 42 deferred call indicator 271 and 
the indicator for the call placed on hold (next to the high- 
lighted RESUME button in video window 203). as well 
as the image of caller 272 from the laptop in the field in 
Central Mexico. Although Mexican caller 272 is out- 
doors and has no direct access to any wired telephone 
connection, his laptop has two wireless modems per- 
mitting dial-up access to two data connections in the 
nearest fieki office (through which his calls were 
routed). The system automatically (fcmsed upon the lap- 
top's registered service capabilities) allocated one con- 
nection for an analog telephone voice call (using his 
laptop's built-in rrdcrophone and speaker and the 
Expert's computer-integrated telephony capabilities) to 
provide audio teleconferencing. The other connection 
provides control, data conferencing and one-way digital 
video (i.e., the laptop user cannot see the image of the 
Expert) from the laptop's built-in camera, albeit at a very 
slow frame rate (e.g., 3-10 small frames per second) 
due to the relatively slow dial-up phone connection. 
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[0197] It is important to note that, despite the limited 
capabilities of the wireless laptop equipment, the 
present invention accommodates such capabilities, 
supplementing an audio telephone connection with lim- 
ited (i.e.. relatively slow) one-way video and data confer- 
encing functionality. As telephony and video 
compression technologies improve, the present inven- 
tion will accommodate such inprovements automati- 
cally. Moreover, even with one participant to a 
teleconference having limited capabilities, other partici- 
pants need not be reduced to this lowest common 
denominator." For example, additional participants 
could be added to the call illustrated in Fig. 42 as 
described above, and such participants could have full 
videoconferencing, data conferencing and other collab- 
orative functionality vis-a-vis one another, while having 
limited functionality only with caller 272. 
[0198] As Ns day evolved, the off-site salesperson 
272 in Mexico was notified by his manager through the 
laptop about a new security and became convinced that 
his client would have particular interest in this issue. 
The salesperson therefore decided to contact the 
Expert as shown in Figure 42. While discussing the 
security issues, the Expert again shares all captured 
graphs, charts, etc. 

[0199] The salesperson 272 also needs the Expert's 
help on another issue. He has hard copy only of a cli- 
ent's portfolio and needs some advice on its composi- 
tion before he meets vtnth the client tomorrow. He says 
he will fax it to the Expert for analysis. Upon receiving 
the fax"0n his CMW. via computer-integrated fax-the 
Expert asks rf he should either send the Mexican caller 
a "QuickTime" movie (a lower quality compressed video 
standard from Apple Computer) on his laptop tonight or 
send a higher-quality CD via FedX tomorrow - the notion 
being that the Expert can produce an actual video pres- 
entation with models and annotations in video form. The 
salesperson can then play it to his client tomorrow after- 
noon and it will be as If the Expert is in the room. The 
Mexican caller decides he would prefer the CD. 
[0200] Continuing with this scenario, the Expert 
learns, in the course of his call with remote laptop caller 
272, that he missed an important Issue during his previ- 
ous quick scan of his incoming multimedia mail mes- 
sage. The Expert is upset that the sender of the 
message did not utilize the "video highlight" feature to 
highlight this aspect of the message. This feature per- 
mits the composer of the message to define "tags" (e.g., 
by dickihg a TAG button, not shown) during record time 
which are stored with the message along with a "time 
stamp," and which cause a predefined or selectable 
audio and/or visual Indicator to be played/displayed at 
that precise point in tiie message during playback. 
[0201 ] Because this Issue relates to the caller that the 
Expert has on hold, the Expert decides to merge the 
two calls together by adding tiie call on hold to his exist- 
ing call. As noted above, botii tiie Expert and tiie previ- 
ously held caller will have full video capabilities vis-a-vis 



one anotiier and will see a three-way mosaic image 
(witii the image of caller 272 at a slower frame rate), 
whereas caller 272 will have access only to tiie audio 
portion of tiiis three-way conference call, though he wilt 
5 have data conferencing functionality with botii of tiie 
ottier participants. 

[0202] The Expert fonvards the multimedia mail mes- 
sage to tx}th caller 272 and the otiier participant, and all 
three of them review tiie video enclosure in greater 

10 detail and discuss the concern raised by caller 272. 
They share certain relevant data as described above 
and realize that ttiey need to ask a quick question of 
another remote expert. They add tiiat expert to tiie call 
(resulting in the addition of a fourth image to the video 

15 mosaic, also not shown) for less than a minute while 
they obtain a quick answer to their question. They then 
continue their three-way call until the Expert provides 
his advice and ttien adjourns the call. 
[0203] The Expert composes a new multimedia mail 

20 message, recording his image and audio synchronized 
(as described above) to tiie screen displays resulting 
from his simultaneous interaction witii his CMW (e.g., 
running a program that performs certain calculations 
and displays a graph while tiie Expert illustrates certain 

25 points by telepointing on the screen, during which time 
his image and spoken words are also captured). He 
sends this message to a number of salesforce recipi- 
ents whose identities are detehnined automatically by 
an outgoing mail filter that utilizes a datat>ase of infor- 

30 mation on each potential recipient (e.g., selecting only 
tiiose whose clients have investment policies which 
allow this type of investnent). : 
[0204] The Expert then receives an audio and visual 
reminder (not shown) that a particular video feed (e.g., 

35 a short segment of a financial cable television show fea- 
turing new financial instruments) will be triggered auto- 
matically in a few minutes. He uses this time to search 
his local securities;, database, ;which is dynannically 
updated from financial irtformation feeds (e.g.. prepared 

40 from a broadcast textual stream of current financial 
events witii indexed headers that automatically applies 
data filters to select incoming events relating to certain 
securities). The video feed is tiien displayed on the 
Expert's screen and he watches this short video seg- 

45 ment. 

[0205] After analyzing this extremely up-to-date infor- 
mation, tiie Expert then reinitiates his previously 
deferred call, from iridicator 271 shown in Fig. 42, which 
he knows is from the Head of Sales in Lx>s Angeles, who 

50 is seeking to provide his prime clients with securities 
advice on another securities-transaction based upon 
the most recent available Information: The Expert's call 
is not answered directly, though he receives a short pre- 
recorded video message (left by- the caller who had to 

55 leave his home for a riie6ting across town soon after his 
priority message was deferred) asking that tiie Expert 
leave him a^multimedia mall reply message with advice 
for a particular client, and explaining that he will access 
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this message remotely from his laptop as soon as his 
meeting is concluded. The Expert complies with this 
request and composes and sends this mail message. 
[0206] The Expert then receives an audio and visual 
reminder on his screen Indicating that his office hours 
will end in two minutes. He switches from Intercom" 
mode to "telephone" mode so that he will no longer be 
disturbed without an opportunity to reject incoming calls 
via the New Gall window described above. He then 
receives and accepts a final call concerning an issue 
from an electronic meeting several months ago. which 
was recorded in its entirety. 

[0207] The Expert accesses this recorded meeting 
from his "corporate memory". He searches the recorded 
meeting (which appears in a second video window on 
his screen as would a live meeting, along with standard 
controls for stop/play/rewind/fast fonward/etc.) for an 
event that will trigger his memory using his fast forward 
controls, but cannot locate the desired portion of the 
meeting. He then elects to search the ASCII text log 
(which was automatically extracted in the background 
after the meeting had been recorded, using the latest 
voice recognition techniques), but still cannot locate the 
desired portion of the meeting. Finally, he applies an 
information filter to perform a content-oriented (rather 
than literal) search and finds the portion of the meeting 
he was seeking. After quickly reviewing this short por- 
tion of the previously recorded meeting, the Expert 
responds to the caller's question, adjourns the call and 
concludes his office hours. 

[0208] It should be noted that the above scenario 
involves many state-of-the-art desktop tools (e.g.. video 
and information feeds, Information filtering and voice 
recognition) that can be leveraged by our Expert during 
videoconferencing, data conferencing and other collab- 
orative activities provided by the present invention - 
because this invention, instead of providing a dedicated 
videoconferencing system, provides a desktop multime- 
dia collaboration system that integrates into the Expert's 
existing workstation/LAfsl/WAN environment. 
[0209] It should also be noted that all of the preceding 
collaborative activities in this scenario took place during 
a relatively short portion of the expert's day (e.g.. less 
than an hour of cumulative time) while the Expert 
remained in his office and continued to utilize the tools 
and information available from his desktop. Prior to this 
invention, such a scenario would not have been possi- 
ble because many of these activities could have taken 
place only with face-to-face collaboration, which in 
many circumstances is not feasible or economical and 
which thus may well have resulted in a loss of the asso- 
ciated txjsiness opportunities. 
[0210] Although the present invention has been 
described in connection with particular preferred 
embodiments and examples, it is to be understood that 
many modifications and variations can be made in hard- 
ware, software, operation, uses, protocols and data for- 
mats without departing from the scope to which the 



inventions disclosed herein are entitled. For example, 
for certain applications. It will be useful to provide some 
or all of the audio/video signals in digital form. Accord- 
ingly, the present Invention Is to be considered as 
5 Including ail apparatus and methods encompassed by 
the appended claims. 

Claims 

10 A. A teleconferencing system for conducting a tel- 
econference among a plurality of participants, com- 
prising: 

(a) a plurality of workstations (12) each having 
15 monitors (200) for displaying visual images, 

and associated AV capture (500. 600) and 
reproduction (20, 700) capabilities for capturing 
and reproducing video images and spoken 
audio of the participants; and 
20 (b) a common collaboration initiator (161) for 

initiating a plurality of types of collaboration 
among the plurality of participants, the types of 
collaboration including data conferencing, vide- 
oconferencing, telephone conferencing, and 
25 the sending of faxes and multimedia mail mes- 

sages, said common collaboration initiator 
(161) including 

(i) a participant selector (161 , 66. 63. 206) 
30 for selecting one or more desired partici- 
pants from among a plurality of potential 
participants; and 

(ii) a collaboration type selector (160. 204) 
for selecting a desired collaboration type 

35 from among said plurality of collaboration 

types. 

2. The teleconferencing system of claim 1. wherein 
said participant selector (161, 66. 63, 206) 
40 includes: 

(a) a roiodex selector (206) for selecting one or 
more desired participants from a first set of 
said potential participants; and 
45 (b) a quick-dial selector (204) for selecting one 

or more desired participants from a second set 
of potential participants, said second set being 
a subset of said first set. 

so 3. Theteleconferencingsystemof claim 2, wherein: 

(a) said roiodex selector (206) Includes names 
of the potential participants in said first set; and 

(b) said quick-dial selector (204) includes icons 
55 representing the potential participants in said 

second set. 

4. The teleconferencing system of claim 2, wherein 
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said rolcdex (206) and quick<lial (204) selectors 
have associated collaboration type selector buttons 
representing said collaboration types. 

5. The teleconferencing system of claim 2. wherein s 
said rolodex (206) and quick<fial (204) selectors 
appear in the same window on a workstation moni- 
tor (200). 

6. The teleconferencing system of claim 1, wherein io 
said common collaboration (161) can be invoked by 

a single user action for selecting each of said 
desired participants, a single user action for select- 
ing said desired collaboration type, and. if said 
desired collaboration type is not videoconferencing is 
or telephone conferencing, an additional single 
user action for selecting information to be sent to at 
least one of said desired participants. 

7. The teleconferencing system of claim 1, wherein 20 
said common collaboration initiator (161) can be 
invoked by a single user action for selecting one of 
said participants and a default collaboration type. 

8. The teleconferencing system of any one of the pre- 25 
ceding claims, further comprising: 

(a) an add participant selection mechanism 
(160, 204. 63, 204) for selecting a new partici- 
pant from among a plurality of potential partici- 30 
pants and adding said new participant to an 
active teleconference call. 

9. The teleconferencing system of any one of the pre- 
ceding claims, further comprising: 35 

. (a) a teleconferencing manager (160, 204. 62. 
. 63) for managing a teleconference among said 
plurality of participants, wherein at least one of 
said participants can be a multimedia service 40 
(502) either: 

(i) providing audio and/or video signals for 
reproduction at the workstation (12) of 
another of said participants; or 4s 
00 receiving video images and/or spoken 
c audio of another of said participants. 

1 0. The telidconferencing system of any one of the pre- 
ceding claims, including an AV path (13b, 14) for so 
carrying AV signals among the workstations (12), 

the AV signals representing video images and/or 
spoken audio of the partdpants, wherein the AV 
path (13b) is inrplemented with unshielded twisted 
pair wiring. 55 

11. A < teleconferencing system for conducting a tel- 
econference among a plurality of participants hav- 



ing workstations with associated monitors for 
displaying visual images, and with associated AV 
capture and reproduction capabilities for capturing 
and reproducing video images and spoken audio of 
said participants, said workstations being intercon- 
nected by a first networK said network providing a 
data path for candying digital data signals among 
said workstations, the teleconferencing system 
conprising: 

(a) a common collaboration initiator for initiat- 
ing a plurality of types of collaboration among 
said plurality of participants, said types of col- 
laboration being selected from the set consist- 
ing of data conferencing, videoconferencing, 
telephone conferencing, the sending of faxes 
and the sending of multimedia mail messages, 
said common cdlaboration initiator including: 

(i) a participant selector for selecting one 
or more desired participants from among a 
plurality of potential participants; and 

(ii) a collaboration type selector for select- 
ing a desired collaboration type from 
among said plurality of collaboration types. 

12. The teleconferencing system of claim 1 1 , said par- 
ticipant selector having : 

(a) a rolodex selector for selecting one or more 
desired participants from a first set of said 
potential participants; and 

(b) a quick-dial selector for selecting one or 
more desired participants from a second set of 
potential participants, said second set being a 
subset of said first set. 

13. The teleconferencing system of claim 11. wherein 
sakj common collaboration initiator can be invoked 
by a user action for selecting one of said partici- 
pants and a default collaboration type. 
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